Remove all LLM generated commits before people get hurt by this nonsense

(github.com)

30 points | by shantnutiwari 12 hours ago

3 comments

amluto 1 hour ago
This is kind of a sad situation. Tridge is an excellect programmer and a very respected member of the community, and I totally get it. rsync, like most old C projects, has a lot of accumulated cruft, and things that would be nice to fix, and bugs. And those bugs come in at least three classes: semantic bugs, improper interactions with the OS, and memory safety bugs. And the author and long-time maintainer has the same problem as every other maintainer and team: not enough time to deal with everything.
And now LLMs come along, and they are so, so seductive. They will fix your bugs if you ask them to. They will even find your bugs. And they're right a remarkably large fraction of the time. It's magic! You can write an agent loop or magic harness or swarm and let them do this on their own if you want. And so you start getting through your backlog, and it's fun, and you feel good, and you let your guard down. And you start having problems:
- Your favorite LLM does not have the context that lives in your head. I use rsync because Tridge wrote a fine piece of software, and he knows how to write serious software, and I'm willing to accept that it's in C and therefore almost certainly has a safety bug or three. If I wanted to use claude-ersatz-rsync, I'd use that instead, but I really don't, TYVM.
- Remember how LLMs are right a remarkable fraction of the time? The fraction is remarkable, but it's nowhere close to 100%. (Yet? Who knows. Right now, it's DEFINITELY nowhere near 100%.)
- The training process for the current crop of LLMs does not adequately reinforce long-term maintainability of the outputs. And, for all the LLMs seem magic, they seem to love a workload in which they write code with poorly named functions and no docs and sort of assume that they can parse their own code down the road and figure out WTF is going on, and they are AT BEST only a tiny bit right. Because every project has interfaces where one module touches another, and every LLM has very limited context (larger than humans' in straight up verbatim working memory but MUCH MUCH WORSE than humans' (for now, anyway) in actual broad picture retention), and this workload doesn't work. If it did, we could give up on structured programming and just have the LLMs vomit up uncommented asm.
And so, where humans have conventions and decently named functions and ideas that you shouldn't churn your code just for funsies (at least not in a production context), LLMs do this:
https://github.com/RsyncProject/rsync/commit/30656c5e358b1c6...
Most of that is blindly changing calls do functions like do_foo(args) (which makes sense) to do_foo_at(the same args), which makes no sense. Sorry, but the world of POSIXish-targetting programers (including, presumably, Claude) knows what _at means, and it means "at" the specified directory fd. Which is not specified in the call sites. It makes no sense at all.
Buried in all that mess [0] is the implementations, which are sloppy. Seriously:
- There's a function called do_utimensat_at. Is Claude stuttering?
- There's a lovely comment in syscall.c:1660-1673 that's quite bad. It's handling strings that contain "/../" and such. If there's some actual contract that the function makes to its callers (and there surely is -- this is critical security-sensitive code), then SAY WHAT THE CONTRACT IS. Don't bury a partial explanation in a comment in the middle.
- There's a repeated pattern: In do_foobar_at(path), there is, in effect:
```
    if (!path)
        do_foobar(path);
```
Nice NULL pointer handling. Is NULL a valid argument or not? Why handle it by forwarding it to the less secure variant?
- Those nice, supposedly secure "at" variants check for paths that start with '/' and forward to the raw insecure syscall. And they don't check for .. in the middle. So what, exactly, is the special code for .. promising to do? (See above.)
I don't think more details are needed. But my take is that this whole thing is a mistake. I personally work on the sort of code where messes like this are entirely unacceptable. And using an LLM while maintaining the kind of oversight that prevents it is mentally taxing and not exactly fun.
If you want to fix all the gunk in a C program like rsync by LLM magic, go rewrite it in Rust or something -- you're already exposing yourself to a massive rewrite and all the risks that entails, and you're pretty much guaranteeing a high level of sloppiness, so at least use a language that is more resistant to slop.
[0] Which GitHub doesn't even render by default because their diff viewer is so bad.
[-]
- cyphar 46 minutes ago
  > - There's a lovely comment in syscall.c:1660-1673 that's quite bad. It's handling strings that contain "/../" and such. If there's some actual contract that the function makes to its callers (and there surely is -- this is critical security-sensitive code), then SAY WHAT THE CONTRACT IS. Don't bury a partial explanation in a a comment in the middle.
  I took a look at this earlier before seeing your comment and (as the author of openat2), I found a few other bugs[1] that you could easily imagine coming out of Claude if you didn't religiously review every edit. My impression is that the (unnecessary) ".." handling was added by Claude to provide "consistency" but it actually doesn't make sense if you think about it for more than a few minutes.
  I will say that I don't really blame Tridge for this, Claude often writes code like this and it is very hard to notice it making dumb assumptions unless you sit back and read the code as if it was submitted to you by someone else without any context from the conversation with the LLM. It is very hard to switch from "System 1" thinking (talking with an LLM) to "System 2" thinking (reviewing the code as if someone you do not trust submitted it)[2].
  This is one of the reasons I only really use LLMs for code review or writing tests to see if it can find edge cases I didn't see -- it's really hard to be sure it didn't make a reasonable-sounding-but-actually-remarkably-stupid assumption when writing code that is not something you are intimately familiar with. In those cases, actually writing the code gives you the chance to think about those edge cases and do research based on that, so even if you buy the argument that LLMs can output similar code to what you would've written (which hasn't been my experience at all) it is very hard to review changes without that background knowledge.
  [1]: https://github.com/RsyncProject/rsync/pull/939 [2]: https://en.wikipedia.org/wiki/Thinking%2C_Fast_and_Slow
  [-]
  - amluto 35 minutes ago
    Hah, you read more of it than I did :) I’m willing to believe that there is some other code path that thinks that a path like "a/../b" is not allowable, and Claude saw that and wanted to enforce it, and then forgot about it when writing the rest of its code. I didn’t try to dig in to the history, nor do I want to.
    (Of course, catching "a/../b" is not the same as catching "a/foo/b" where foo is a symlink to "..". Claude doesn’t think unless you make it think.)
    (Hi, cyphar!)
    [-]
    - cyphar 27 minutes ago
      > I’m willing to believe that there is some other code path that thinks that a path like "a/../b" is not allowable, and Claude saw that and wanted to enforce it, and then forgot about it when writing the rest of its code.
      That is also possible, but this mostly smells like a weird LLM-hallucinated assumption to me because the whole concept of blocking literal ".." components doesn't make sense if you allow symlinks -- symlinks can contain ".." and RESOLVE_BENEATH allows them if they are safe. Maybe the old code did have that faulty assumption built in (there were some security bugs related to it, after all), but one would hope that The $1T A.G.I.(TM) would be able to notice such a mistake.
      EDIT: Ah, you added a note about ".." and symlinks. Yeah, that was one of the major things that made me think the whole thing was an LLM hallucination or possibly it over-applying a critique that Tridge may have made when reviewing the code. Without access to the chat logs it's hard to know.
      (For completely unrelated reasons, I plan to send a patch for RESOLVE_NO_DOTDOT that lets you block them entirely, so you could eventually do this kind of blocking if you really wanted to but most users would prefer RESOLVE_BENEATH behaviour anyway -- blocking ".." entirely will lead to very painful regressions, as other users have already mentioned.)
      (PS: Hey Andy, long time no see! :D)
- cookiengineer 33 minutes ago
  Most problems you described come from using LLMs with high temperatures and long lifecycles.
  The point behind agentic environments and an orchestrator/planner architecture is that you can delegate defined and specified tasks to short living and low tempered agents. Well, at least if you know what you are doing, and know how to specify these things based on an AST and not markdown notes, for example.
  (Judge for yourself if Claude is the right IDE for that, given that their source code from a couple weeks ago is available)
  Languages that use pointers and have dependencies on time need strong specifications for LLMs to make sense. And that's what's missing.
  Disclaimer: I am building a pentesting and Go focused agentic IDE for short living agents, so my point of view is not using hyped tools but trying to understand harness engineering well enough to let the kids mess with your codebase.
mhitza 12 hours ago
Recently discussed https://news.ycombinator.com/item?id=48342705
Havoc 6 hours ago
Doesn't really seem like an in good faith issue filing.
Everyone has by now realized opinions on this are mixed & rather sound. What exactly does opening a "hey this is nonsense" issue add?