stagehand icon indicating copy to clipboard operation
stagehand copied to clipboard

Limit scope with locators

Open loop-automation opened this issue 9 months ago • 5 comments

Absolutely love the project. I think this is the right spot for this, but tell me to go away if not.

I'm currently working on a project where Stagehand's act() method occasionally interacts with unintended sections of our application's DOM, such as the leftNav, instead of focusing on the main content area. This happens even when I specify locators in the prompts, leading to unpredictable automation behavior.​

To improve the determinism and reliability of Stagehand's interactions, I suggest enhancing the act() method to accept a Playwright Locator object as an optional parameter. This would allow users to explicitly define the DOM scope for each action, ensuring interactions are confined to the intended section. For example:​

const mainContentLocator = page.locator('.content-wrapper'); await page.act({ action: "type %email% into the input field with name='email'", variables: { email: "[email protected]" }, scope: mainContentLocator });

In this scenario, the scope parameter restricts the act() method's evaluation to the .content-wrapper element, preventing unintended interactions with elements like the leftNav.​

Implementing DOM scope limiting in Stagehand's act() method via Playwright locators could allow also significantly less unnecessary token usage.

loop-automation avatar Mar 13 '25 21:03 loop-automation

Thanks so much for taking the time to try Stagehand! Have you tried using observe? You can use observe to plan out act so it doesn't do anything you don't intend it to do. You can also override any suggested xpaths/actions from observe so you know exactly what act is going to do

kamath avatar Mar 13 '25 21:03 kamath

Hmm, so you'd observe in a loop while the suggested XPath doesn't match? If you have no way of suggesting where to look, how will first observing before acting help?

JosXa avatar Mar 13 '25 23:03 JosXa

Thanks for the responses! I really appreciate the guidance. I’ve tried using observe, but it’s entirely possible I still have some things to learn there.

My experience so far is that even when I give observe specific instructions, it doesn’t always identify the right area.

For example:

const brcResults = await page.observe({ instruction: "Find the option labeled 'ABC Capital' in the dropdown menu that has classes 'VirtualizedSelectOption' and 'Select-option'", onlyVisible: true });

This tends to get confused with other values in the table labeled "ABC Capital." My solution was to constrain the scope of where observe or act can look by using a locator to eliminate the ambiguity.

I'm hoping I can constrain the scope of the DOM that can be chunked and sent to the LLM, therefore proactively removing areas it is accidentally identifying from ever even being processed.

loop-automation avatar Mar 13 '25 23:03 loop-automation

After having some coffee, I feel I could be a bit clearer:

My thought process is that we can control the chunking scope by passing in a locator that constrains the chunking to its children. This has a couple of core benefits: it limits token usage by only passing in the relevant portion of the DOM, creates more determinism by allowing a human to limit the scope, and improves performance by reducing the amount of data processed at once.

Example: This would only chunk the children of the class .leftNav:

await page.act({
  action: "click 'funds' under the 'bulk edit' section",
  locator: ".leftNav",  
});

loop-automation avatar Mar 14 '25 15:03 loop-automation

This would also be really great for the extract API, if we can limit scope for where the extracts happen that'd be a great way to reduce tokens and also allow us to do things like:

  • observe where hover menus might be
  • hover over the observed menu
  • extract data from just the menu

eta: I just saw that a selector is supported for when useTextExtract is true, would be nice to have support for that when it is false for cases that need more than the text

CaffeinatedCM avatar Mar 28 '25 16:03 CaffeinatedCM

Hey @CaffeinatedCM @loop-automation @JosXa , we added support for this natively across extract and observe on v3, check it out!

const tableData = await stagehand.extract(
  "Extract the values of the third row",
  z.object({
    values: z.array(z.string())
  }),
  {
    // xPath or CSS selector
    selector: "xpath=/html/body/div/table/" 
  }
);

PS: to use in act feel free to pass directly the Action returned by observe into act for a fully deterministic action (no inference)

miguelg719 avatar Nov 03 '25 03:11 miguelg719