stagehand Try giving `act` a `removeBlocker` tool, in order to limit how much freedom we give the LLM

The team had some back and forth around how "agentic" our action steps should be. For example, if a human or agent instructs stagehand to "search for baseball tickets" on a site, should stagehand only take the next step of typing in the search box? Should it also enter the search? The more generality you give the model, the lower the accuracy. Additionally, this will start to conflict with the agentic behavior that the people who are using stagehand are supposed to be figuring out.

One way to address this is to shift the framing of the tools we provide stagehand during actions. We could introduce a removeBlocker tool, which would be scripted to remove things like popovers or dialogs when they represent the majority of the DOM. In our limited experience this is the most common way that a goal can be blocked.

If that is effective, we can make the act prompt more specific to only doing things that achieve the goal, preventing "wandering" which is an undesirable outcome at our level of abstraction.

Jun 08 '24 18:06 jeremypress

one of the peeler runs failed because instead of clicking the "x" it chose another option in the popup and then neglected to remove the rest of the blocker. removeBlocker could address that!

Jun 11 '24 00:06 jeremypress

I like the idea — a dedicated removeBlocker tool is a simple, well-scoped way to reduce wandering by letting Stagehand handle the most common non-goal blockers (big popovers / modals / cookie banners) while keeping the main action intent narrow. It also keeps the main act prompt focused ("do steps that directly achieve the goal") and delegates the “remove UI noise” policy to a small, auditable tool.

Proposed behaviour (high level)

Tool name: removeBlocker

Intent: detect and remove/close UI elements that prevent continuing the user flow (large modals, full-screen overlays, intrusive popovers).

Return value: { removed: boolean, method: 'click'|'esc'|'remove'|'none', selector?: string, note?: string }

Safety: conservative by default — only act on elements that meet both size/position and typical modal attributes (role=dialog, fixed/sticky, high z-index); allow opt-in aggressive mode.

API (suggested)

// Called by the runtime (Stagehand) when an action fails due to a visible blocker, // or proactively when a page load is blocked. type RemoveBlockerOpts = { aggressive?: boolean; // default false — don't remove small/unknown overlays allowedSelectors?: string[]; // whitelist selectors that are safe to remove domainWhitelist?: string[]; // allow/disallow aggressive removals per domain };

type RemoveBlockerResult = { removed: boolean; method: 'click'|'esc'|'remove'|'none'; selector?: string; note?: string; };

async function removeBlocker(page, opts: RemoveBlockerOpts): Promise<RemoveBlockerResult> { ... }

Implementation sketch (Playwright / Puppeteer style)

// rough sketch — move into lib/utils/removeBlocker.ts async function removeBlocker(page, { aggressive=false, allowedSelectors=[], domainWhitelist=[] } = {}) { // 1) find candidates: elements with fixed/sticky position OR role=dialog, // or elements with computed z-index >> rest of page const candidates = await page.evaluate(() => { const vp = {w: innerWidth, h: innerHeight}; const els = Array.from(document.querySelectorAll('body *')); return els.map(el => { const r = el.getBoundingClientRect(); const style = getComputedStyle(el); return { selector: el.tagName + (el.id ? #${el.id} : '') + (el.className ? .${el.className.split(' ').join('.')} : ''), fixed: style.position === 'fixed' || style.position === 'sticky', role: el.getAttribute('role'), ariaHidden: el.getAttribute('aria-hidden'), z: parseInt(style.zIndex) || 0, area: r.width * r.height, rect: { top: r.top, left: r.left, width: r.width, height: r.height } }; }); });

// 2) choose candidate(s) heuristically: area >= 30% viewport OR role=dialog OR z-index high // Attempt in order: find close button -> send Escape -> remove element via DOM remove() // (implementation details omitted) }

(Implementation should attempt, in this order: click dismiss/close buttons with selectors like [aria-label~="close"i], [role=button][aria-label~="dismiss"i], common × buttons; send Escape; and as last resort element.remove().)

Prompt / act change

Make act strictly: only take steps that "directly achieve the user's high-level goal" and call removeBlocker as an available tool rather than describe manual DOM surgery in the prompt. Example prompt snippet:

You are given tools: type, click, removeBlocker, waitFor, etc. Only use actions that clearly progress the goal. If a large modal/popover blocks progress, call removeBlocker instead of attempting free-form exploration.

This limits LLM freedom and centralises the risky UI-manipulation logic in a deterministic tool.

Edge cases & safety

Cookie consent / legal dialogs — these may be important. Default behaviour should be conservative (do not remove if element text includes "cookie" / "privacy" unless aggressive=true or whitelist).

Login gates or multi-step flows — don't remove overlays that are the primary UX for continuing the flow (unless caller explicitly allows).

Accessibility — prefer clicking close buttons or sending ESC to removing nodes — DOM removal is last resort.

Telemetry / audit trail — log when we remove/close a blocker (selector, method, URL, timestamp) so we can inspect false positives.

Tests

Unit tests that simulate:

large modal with role=dialog and close button → should click close

full-screen cookie banner without close button → should send ESC or remove only if aggressive

small tooltip/popover (should be ignored)

E2E test against a small list of real sites with common blockers (cookie banners, promo modals).

Config / opt-in

Expose in Stagehand config:

removeBlocker: enabled: true aggressive: false domainOverrides: "example.com": { aggressive: true } whitelistSelectors: []

Oct 25 '25 06:10 blr419