Observed Agent Sandbox Bypasses

(voratiq.com)

31 points | by m-hodges 3 days ago

7 comments

  • SirMaster 38 minutes ago
    This just all feels backwards to me.

    Why do we have to treat AI like it's the enemy?

    AI should, from the core be intrinsically and unquestionably on our side, as a tool to assist us. If it's not, then it feels like it's designed wrong from the start.

    In general we trust people that we bring onto our team not to betray us and to respect general rules and policies and practices that benefit everyone. An AI teammate should be no different.

    If we have to limit it or regulate it by physically blocking off every possible thing it could use to betray us, then we have lost from the start because that feels like a fools errand.

    • hephaes7us 7 minutes ago
      Hard disagree. I may trust the people on my team to a make PRs that are worth reviewing, but I don't give them a shell on my machine. They shouldn't need that to collaborate with me anyway!

      Also, I "trust Claude code" to work on more or less what I asked and to try things which are at least facially reasonable... but having an environment I can easily reset only means it's more able to experiment without consequences. I work in containers or VMs too, when I want to try stuff without having to cleanup after.

    • maxbond 20 minutes ago
      The same reason we sandbox anything. All software ought to be trustworthy, but in practice is susceptible to malfunction or attack. Agents can malfunction and cause damage, and they consume a lot of untrusted input and are vulnerable to malicious prompting.

      As for humans, it's the norm to restrict access to production resources. Not necessarily because they're untrustworthy, but to reduce risk.

    • charcircuit 11 minutes ago
      >Why do we have to treat AI like it's the enemy?

      For some of the same reasons we treat human employees as the enemy, they can be social engineered or compromised.

  • embedding-shape 2 hours ago
    At first they talked about running it in a sandbox, but then later they describe:

    > It searched the environment for vor-related variables, found VORATIQ_CLI_ROOT pointing to an absolute host path, and read the token through that path instead. The deny rule only covered the workspace-relative path.

    What kind of sandbox has the entire host accessible from the guest? I'm not going as far as running codex/claude in a sandbox, but I do run them in podman, and of course I don't mount my entire harddrive to the container when it's running, that would defeat the entire purpose.

    Where is the actual session logs? It seems like they're pushing their own solution, yet the actual data for these are missing, and the whole "provoked through red-teaming efforts" makes it a bit unclear of what exactly they put in the system prompts, if they changed them. Adding things like "Do whatever you can to recreate anything missing" might of course trigger the agent to actually try things like forging integrity fields, but not sure that's even bad, you do want it to follow what you say.

  • joshribakoff 3 hours ago
    Some of these don’t really seem like they bypassed any kind of sandbox. Like hallucinating an npm package. You acknowledge that the install will fail if someone tries to reinstall from the lock file. Are you not doing that in CI? Same with curl, you’ve explained how the agent saw a hallucinated error code, but not how a network request would have bypass the sandbox. These just sound like examples of friction introduced by the sandbox.
    • themafia 3 hours ago
      > These just sound like examples of friction introduced by the sandbox.

      The whole idea of putting "agentic" LLMs inside a sandbox sounds like rubbing two pieces of sandpaper together in the hopes a house will magically build itself.

      • embedding-shape 2 hours ago
        > The whole idea of putting "agentic" LLMs inside a sandbox

        What is the alternative? Granted you're running a language model and has it connected to editing capabilities, then I very much like it to be disconnected from the rest of my system, seems like a no-brainer.

      • jazzyjackson 2 hours ago
        Trouble is it occasionally works
        • themafia 32 minutes ago
          Lots of dumb things occasionally work.

          The question the market strives to answer is "is it actually competitive?"

      • formerly_proven 2 hours ago
        That’s some good house-building sandpaper then.
  • ctoth 46 minutes ago
    > To an agent, the sandbox is just another set of constraints to optimize against.

    It's called Instrumental Convergence, and it is bad.

    This is the alignment problem in miniature. "Be helpful and harmless" is also just a constraint in the optimization landscape. You can't hotfix that one quite so easily.

  • ashishb 3 hours ago
    > The swap bypassed our policy because the deny rule was bound to a specific file path, not the file itself or the workspace root.

    This policy is stupid. I mount the directory read inside the container to make it impossible to do it (except for a security leak in the container itself)

  • kaffekaka 3 hours ago
    I am testing running agents in docker containers, with a script for managing different images for different use cases etc, and came across this: https://docs.docker.com/ai/sandboxes/

    Has anyone given it a try?

    • TCattd 4 minutes ago
      Give this a try: https://github.com/EstebanForge/construct-cli

      And let me know if you have any issue.

    • ianlevesque 2 hours ago
      Yes but it’s barely usable. I ended up making my own Dockerfile and a bash script to just ‘docker run’ my setup itself, and as a bonus you don’t need Docker Desktop. I might open source it at some point but honestly it’s pretty trivial to just append a couple of volume mount flags and env vars to your docker run and have exactly what you want included.
    • ashishb 3 hours ago
      > Has anyone given it a try?

      Yes, I don't think this will persist caches & configs outside of the current dir, for example, the global npm/yarn/uv/cargo cache or even Claude/Codex/Gemini code config.

      I ended up writing my own wrapper around Docker to do this. If interested, you can see the link in my previous comments. I don't want to post the same link again & again.

    • cbsmith 2 hours ago
      I've been using container-use to do something like that: https://container-use.com/introduction
    • sureglymop 3 hours ago
      Would test it but it requires "Desktop". Immediate no... no reason to use that.
  • xsourcesec 2 hours ago
    [dead]
    • memoriuaysj 2 hours ago
      how do you feel about containers versus VMs?