Very neat, but recently I've tried my best to reduce my extension usage across all apps (browsers/ide).
I do something similar locally by manually specifying all the things I want scrubbed/replaced and having keyboard maestro run a script on my system keyboard whenever doing a paste operation that's mapped to `hyperkey + v`. The plus side of this is that the paste is instant. The latency introduced by even the littlest of inference is enough friction to make you want to ditch the process entirely.
Another plus of the non-extension solution is that it's application agnostic.
That's a great idea. My original excuse to not do that was because I copy so many things but, duh, I could just key the sanitizing copy to `hyperkey + c`.
Multiple things: 1) extensions are overly permissive, 2) so many of them are sold to shady entities without peep from the developer, and 3) it's never been easier to generate my own tooling.
I just download the extension file, check it out, and install it locally. No worries about future updates until something breaks (doesn't tend to happen).
This should be a native feature of the native chat apps for all major LLM providers. There’s no reason why PII can’t be masked from the API endpoint and then replaced again when the LLM responds. “Mary Smith” becomes “Samantha Robertson” and then back to “Mary Smith” on responses from the LLM. A small local model (such as the BERT model in this project) detects the PII.
Something like this would greatly increase end user confidence. PII in the input could be highlighted so the user knows what is being hidden from the LLM.
This is a great idea of using a BERT model for DLP at the door. Have you thought integrating this into semantic router as an option leaving the look-ahead ? Maybe a smaller code base ?
Any plans to make the extension perform a replacement of whatever’s flagged with dummy data? Knowing I have sensitive data is usually not a problem, but constantly needing to replace or remove it is, particularly with larger token counts
How do you prevent these models from reading secrets in your repos locally?
It’s one thing for the ENVs to be user pasted but typically you’re also giving the bots access to your file system to interrogate and understand them right? Does this also block that access for ENVs by detecting them and doing granular permissions?
This is pretty cool. I barely use the web UIs for LLMs anymore. Any way you could make a wrapper for Claude Code/Cursor/Gemini CLI? Ideally it works like github push protection in GH advanced security.
Deploy a TLS interceptor (forward proxy). There are many out there, both free and paid for solutions; there are also agent-based endpoint solutions like Netskope which do this so you don't have to route traffic through an internal device.
can i have this between my machine and git please.. Like its twice now I've commmited .env* and totally passed me by (usually because its to a private repo..) then later on we/someone clears down the files.. and forgets to rewrite git history before pushing live.. it should never have got there in the first place.. (I wish github did a scan before making a repo public..)
GitHub does warn you when you have API keys in your repo. Alternatively, there are CLI tools such as TruffleHog you can put in pre-commit hooks to run before commits automatically
I do something similar locally by manually specifying all the things I want scrubbed/replaced and having keyboard maestro run a script on my system keyboard whenever doing a paste operation that's mapped to `hyperkey + v`. The plus side of this is that the paste is instant. The latency introduced by even the littlest of inference is enough friction to make you want to ditch the process entirely.
Another plus of the non-extension solution is that it's application agnostic.
If we move the detection and modification process from paste to copy operation, that will reduce in-use latency
you might find this useful: https://github.com/classvsoftware/under-new-management
my port (and now fork): https://github.com/maxtheaxe/under-new-management-firefox
they currently (PRs are welcome!) only check listing info. mine doesn't route requests through an external (non addon store) server.
a couple PRs are overdue on mine due to linting making the diffs impossible. I'll get to it. (see the wxt-migration branch)
Something like this would greatly increase end user confidence. PII in the input could be highlighted so the user knows what is being hidden from the LLM.
There's also:
- https://github.com/superagent-ai/superagent
- https://github.com/superagent-ai/vibekit
It’s one thing for the ENVs to be user pasted but typically you’re also giving the bots access to your file system to interrogate and understand them right? Does this also block that access for ENVs by detecting them and doing granular permissions?
Also, how does this deal with inquiries when piece of PII is important to the task itself? I assume you just have to turn it off?
There are a lot of websites that scans the clipboard to improve user experience, but also pose a great risk to users privacy.
Encrypting sensitive data can be more useful than blocking entire requests, as LLMs can reason about that data even without seeing it in plain text.
The ipcrypt-pfx and uricrypt prefix-preserving schemes have been designed for that purpose.
https://git-scm.com/docs/githooks