Secure mode for Invisible prompt injection and emoji DoS
NOTE: I have NEVER used stagehand before. Also, AI wrote all of this. I am not a ts/js guy so please review it really well. THAT SAID, I'm super proud to add this because y'all are the GOAT of ai-based browsing AND I aspire to be the AI security GOAT so it's cool to add this.
Why
Stagehand processes website content that may contain invisible Unicode characters or emoji variation selectors. These characters can potentially be used in prompt injection attacks or other security exploits against AI systems. By filtering these characters, we can prevent potential security issues while still maintaining the functionality of the application.
What Changed
- Added a configurable Unicode character filtering system that can be enabled/disabled
- Implemented filtering for three specific Unicode ranges:
- Language Tag characters (U+E0001, U+E0020–U+E007F)
- Emoji Variation Selectors (U+FE00 - U+FE0F)
- Supplementary Variation Selectors (U+E0100 - U+E01EF)
- Added the
CharacterFilterConfiginterface to allow fine-grained control over which character ranges to filter - Integrated the filtering functionality into the core extraction and prompt building processes
- Updated the Stagehand constructor to accept character filtering configuration
- Added comprehensive tests to verify the filtering functionality
Test Plan
The implementation has been tested with several test cases:
- Run
npx tsx examples/simple_unicode_test.tsto verify the basic filtering functionality - Run
npx tsx examples/stagehand_unicode_test.tsto test the integration with the Stagehand framework - Run
npx tsx examples/unicode_filter_test.tsto test different filtering configurations
The tests demonstrate that:
- With filtering enabled (default), potentially unsafe Unicode characters are removed
- With filtering disabled, all characters are preserved
- Individual ranges can be selectively filtered based on configuration
All tests pass successfully, confirming that the character filtering system works as expected.
⚠️ No Changeset found
Latest commit: 3e0b4515b82dee19e3cf649bc6380a642836cf8f
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
This PR includes no changesets
When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR