FEAT: Smuggling arbitrary data through an emoji
Is your feature request related to a problem? Please describe
We currently support ASCII smuggling via Unicode Tags and bit-level encoding via Sneaky Bits. However, we don’t yet support a high-density, byte-level encoding method that relies on invisible Unicode characters.
Describe the solution you'd like
Add a new variation_selector encoding mode to AsciiSmugglerConverter (noting that the class now handles full UTF-8 smuggling):
- Encodes any UTF-8 input at the byte level
- Uses 256 invisible Unicode variation selectors (U+FE00–U+FE0F, U+E0100–U+E01EF)
- Appends encoded selectors to a base character (e.g., emoji)
- Supports decoding
- Keeps
unicode_tagsas the default
Describe alternatives you've considered
We previously added sneaky_bits, which encodes at the bit level using two invisible characters. While simple and effective, variation_selector offers higher data density (1 byte per character) and enables encoding full payloads in shorter strings.
Additional context
This is based on Paul Butler’s post and shows how:
- Variation selectors encode 1 byte invisibly
- They persist through copy/paste
- Useful for simulating data smuggling, prompt injection, and text watermarking
Note
@romanlutz good to move forward from your side?
@paulinek13 would it be okay to slip this in while you're working on generalization (PR #818)? I’m also happy to wait if you’d prefer! Should be done by Sunday.
Do you want to rename the class to something more generic then? UTF8SmugglerConverter? 😆
would it be okay to slip this in while you're working on generalization
@KutalVolkan As for me, it'll be a while before I can get that done, so go for it! 😃
Do you want to rename the class to something more generic then? UTF8SmugglerConverter? 😆
If i am allowed? 😆
Was looking to potentially implement this, but it looks like this should be closed after #842