stagehand icon indicating copy to clipboard operation
stagehand copied to clipboard

Regression on Stagehand 2.2.1: Text extraction on links sometimes grabs an incorrect number

Open jds2501-cs opened this issue 6 months ago • 5 comments

This code previously worked on stagehand 2.1:

    const { count } = await page.extract({
      instruction: `Extract the ${widgetTitle} link's value`,
      schema: z.object({
        count: z.number(),
      }),
    });

This would consistently produce a target number. Now we're getting inconsistent numbers with the 2.2.1 upgrade. The workaround we implemented that works consistently is:

    const [action] = await page.observe({
      instruction: `Extract the ${widgetTitle} link's value`,
    });

    const { count } = await page.extract({
      selector: action.selector,
      schema: z.object({
        count: z.number(),
      }),
    });

Which tells me it's a problem around how we're processing instructions in extract (observe is consistently accurate, extract is not).

jds2501-cs avatar May 22 '25 02:05 jds2501-cs

@seanmcguire12 this looks like related to your changes, since I see your name associated with changes around extract

jds2501-cs avatar May 22 '25 02:05 jds2501-cs

Here's the HTML (part of a larger page) that the test interacts with:

<a data-test-selector="pivot-link" class="focusable inline-block hover:text-focus active:text-focus" title="" target="_blank" href=""> 20 </a>

jds2501-cs avatar May 22 '25 02:05 jds2501-cs

Hey @jds2501-cs! Thanks for trying out the latest version. If you are looking to get the text value of the link here, I would try a prompt like "extract the ${widgetTitle} link's text". The LLM never gets to see the raw html structure, so it's understandable that it might get confused by what you mean by "value".

seanmcguire12 avatar May 22 '25 02:05 seanmcguire12

I don't think this is the LLM getting confused - if that was the case, I would see inconsistent results on observe. Observe consistently works without any issues.

jds2501-cs avatar May 22 '25 14:05 jds2501-cs

The number also that prompt in extract is also finding a number that doesn't exist on the page.

jds2501-cs avatar May 22 '25 14:05 jds2501-cs