Feature Request: Add option to disable automatic snapshot return in input tools
Is your feature request related to a problem? Please describe.
Add a skipSnapshot (or similar) parameter to input tools (click, fill, hover, press_key, etc.) to allow disabling the automatic snapshot inclusion in responses.
Describe the solution you'd like
Problem
Currently, all input tools unconditionally call response.includeSnapshot() after completing their actions:
// src/tools/input.js - click handler (line 38)
handler: async (request, response, context) => {
// ... click logic ...
response.includeSnapshot(); // Always called, no way to disable
}
This applies to all input tools:
-
click(line 38) -
hover(line 65) -
fill(line 139) -
fill_form(line 193) -
press_key(line 266) -
drag(line 163) -
upload_file(line 232)
There is no parameter to disable this behavior.
Impact
For complex pages with large DOM structures, the automatic snapshot causes severe context window overflow:
Real-world Example: Jupyter Notebook Testing
| Turn | Action | Context Before | Context After | Jump |
|---|---|---|---|---|
| 74 | click on notebook tab |
~31K tokens | ~242K tokens | +211K |
| 26 | click on "Run All Cells" |
~15K tokens | ~226K tokens | +211K |
The Jupyter notebook page contains hundreds of code cells with output, causing each snapshot to be ~200K+ tokens.
Result: Agent sessions crash with context overflow errors like:
[Context Usage]: 4502K / 200K tokens (2251.0%)
Session encountered an error
Log Evidence
[Tool: mcp__chrome-devtools__click] (after 3.5s thinking)
Input: {"uid":"12_122"}
[Done] (took 4.6s)
[Context] ~242K / 200K tokens (~121.1%) | Turn 74 <-- Explosion here
[Context Usage]: 4502K / 200K tokens (2251.0%)
[Session] Skipping session ID capture due to status: error
Use Cases for Disabling Snapshot
- Large pages: Jupyter notebooks, complex SPAs, data-heavy dashboards
- Rapid interactions: Multiple sequential clicks where intermediate snapshots are unnecessary
- Known state changes: When the agent already knows what will happen and doesn't need verification
- Token budget management: Preserving context window for more important operations
Describe alternatives you've considered
Workaround (Current)
Currently, the only workaround is to avoid browser-based interactions for large pages and use alternative methods (e.g., CLI commands, API calls), which defeats the purpose of browser automation.
Additional context
No response
I think an optimal solution here is to return a snapshot diff because the client still needs to know how the actions affected the page. Otherwise, the only option for a client to proceed is to request the snapshot manually, potentially, after each action. Could you clarify your expectations around how the client should learn about changes caused by its actions?
cc @natorion w.r.t. token reduction
Yes, snapshot diff are expected by the client. However, there are many scenarios where the client does not need to concern itself with post-click differences. I only focus on the completion of the current operation; as for the page response, the agent can perceive it through subsequent observations. So we should provide a parameter, similar to response_snapshot: 'None' | 'All' | 'Diff'
I think this feature makes sense in general. I wonder if this is something that should be configured and exposed via the function (and thus include a token tax) or if it should be configurable via config file (and thus not include a token tax). I think the default should be anyways "diff"?
I think this feature makes sense in general. I wonder if this is something that should be configured and exposed via the function (and thus include a token tax) or if it should be configurable via config file (and thus not include a token tax). I think the default should be anyways "diff"?
I prefer to use function parameters as appropriate, and user can set different parameters for each call as needed. Just set the default value to diff.