Update: LLM Prompt Injection CS
What is missing or needs to be updated?
Full thread here.
How should this be resolved?
From @kwwall:
Either list it as a suggestion rather than trying to write it as code, because it would be a rather large block of code that would detract from the rest of this CS. (We could make it a collapsible section, but still...)
--
I am more interested to know if anyone’s used aspell or something similar in a security-focused setup.
I mostly work in the Java FOSS world. aspell / libaspell seems to be mostly written in C, so it generally wouldn't be something that most Java or Python developers would thing of as a go to approach for this. Hence, if I had to guess, those who are using aspell or similar spelling corrector in a similar security-focused effort is likely to be a very tiny niche. To get an answer, you probably at least will have to ask more broadly, such as on Twitter, Reddit, Substack, Slack, etc.
Thanks for opening this @maheshkukreja and for the context @kwwall.
I agree that a large code block might distract from the main CS - maybe a collapsible section or even linking out to an external example repo could work better. That way, the cheatsheet stays concise but readers who want to dive deeper still get the reference.
As for aspell (or similar tools), I haven’t seen much use of it in security-focused LLM setups either - most work I’ve come across leans toward custom sanitizers/filters rather than off-the-shelf spell checkers. It might be worth asking the broader community to confirm.
Happy to help draft or review the update once there’s consensus on the approach.
@Prasad-JB - Yeah, I was mostly shooting from the hip when I put out 'aspell' as an example. But there are algorithms for indicating string similarity and we shouldn't just go inventing our own as others are likely to be more complete. I did a (non-AI assisted) search internet search and the first thing I discovered is what I think we are looking for here is an area of Computer Science called "string metric". I found quite a few of them, some based on spelling, some on bit difference, some based on phonetic difference. E.g.,
- Levenshtein Distance
- Hamming Distance
- Smith–Waterman
- Sørensen–Dice Coefficient
- Jaro-Winkler Similarity
- Soundex Algorithm
The only ones that I recall are Hamming Distance and Soundex Algorithm. There's too much to detail here, but I did find a few good overviews that at least mentioned a few of these:
- String Similarity: Methods, Pros & Cons, and Choosing the Best Approach
- Stack Overflow question: What are some algorithms for comparing how similar two strings are?
Both of those references are programming language neutral so they should be useful in abstracting a suitable a word similarity string metric.
Thanks for clarifying, @kwwall - that makes sense. Instead of focusing on specific tools like aspell, framing this in terms of string similarity metrics seems much more future-proof and language-agnostic.
Maybe the cheatsheet could briefly list these common approaches (Levenshtein, Hamming, Jaro-Winkler, etc.) with a short note on their pros/cons, and then point readers to external references for deeper dives. That way we highlight the concept without overwhelming the CS with too much implementation detail.
I’d be happy to help draft a section along those lines if the core team agrees that’s the right direction.