browser-use icon indicating copy to clipboard operation
browser-use copied to clipboard

Scroll down failed when the scrollable element is not root element

Open treblam opened this issue 7 months ago • 12 comments

Bug Description

using browser use with clash dashboard, it can't scroll down the page, and I've tried to investigated the problem, it seems that if the scrollable element is not the root element but an div within the page, it can't scroll down the page successfully. the scrollable element is in the screenshot below:

Image

Reproduction Steps

1.install browser use 2.run the following task, I am using gemini 2.5 pro model 3.it scroll the page down multiple times but in fact the page didn't scroll at all.

Image

the webpage is in my home network, I can't provide the online url ,so I upload the code in the attachment, you can download it to your computer to reproduce the problem, please make sure to update the page location to corresponding file system location in the code sample.

clash_dashboard.zip

Code Sample

from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
load_dotenv()

from browser_use import Agent, Browser, BrowserConfig, BrowserContextConfig, Controller, ActionResult
import asyncio
import os

llm = ChatGoogleGenerativeAI(
    model='gemini-2.5-pro-preview-03-25',
    # api_key=SecretStr(os.getenv('GEMINI_API_KEY')),
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

controller = Controller()

@controller.action('Ask user to solve captcha')
def solve_captcha(question: str) -> str:
    answer = input(f'\n There is a captcha, can you solve it?\nInput: ')
    return ActionResult(extracted_content=answer)

@controller.action('Ask user for information')
def ask_human(question: str) -> str:
    answer = input(f'\n{question}\nInput: ')
    return ActionResult(extracted_content=answer)


async def main():
    agent = Agent(
        task="navigate to this page file:///Users/jerry/Downloads/Clash.html, and scroll down the page to find the '测速' button",
        llm=llm,
        use_vision=True,
        controller=controller,

    )
    result = await agent.run()
    print(result)
    input('Press Enter to close the browser...')
    await browser.close()

asyncio.run(main())

Version

0.1.41

LLM Model

Other (specify in description)

Operating System

macOS 15.3.2

Relevant Log Output

INFO     [browser_use] BrowserUse logging setup complete with level info
INFO     [agent] 🧠 Starting an agent with main_model=models/gemini-2.5-pro-preview-03-25 +vision, planner_model=None, extraction_model=None
INFO     [agent] 🚀 Starting task: navigate to this page file:///Users/jerry/Downloads/Clash.html, and scroll down the page to find the '测速' button
INFO     [agent] 📍 Step 1
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745659051.914385 14583372 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
INFO     [agent] 🤷 Eval: Unknown - The initial state is a blank page, so no previous goal to evaluate.
INFO     [agent] 🧠 Memory: Starting the task. The goal is to navigate to file:///Users/jerry/Downloads/Clash.html and find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Navigate to the specified local HTML file.
INFO     [agent] 🛠️  Action 1/1: {"go_to_url":{"url":"file:///Users/jerry/Downloads/Clash.html"}}
INFO     [controller] 🔗  Navigated to file:///Users/jerry/Downloads/Clash.html
INFO     [agent] 📍 Step 2
INFO     [agent] 👍 Eval: Success - Successfully navigated to the specified local HTML file file:///Users/jerry/Downloads/Clash.html.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 3
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page once.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down once. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 4
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page twice.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down twice. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 5
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page three times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down three times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 6
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page four times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down four times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 7
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page five times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down five times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 8
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page six times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down six times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 9
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page seven times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down seven times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 10
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page eight times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down eight times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 11
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page nine times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down nine times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 12
INFO     [agent] 👍 Eval: Success - I successfully scrolled down the page ten times.
INFO     [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down ten times. The task is to find the '测速' button by scrolling down.
INFO     [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 13

treblam avatar Apr 26 '25 09:04 treblam

Yeah inline scrolls are really hard to deal with in our current architecture, we've already been thinking about how to best improve this for some time. Open to ideas.

The real difficulty is the vision model, we cant take screenshots of every possible inline scrollable element in every possible scroll state for the vision model, as it could require many more screenshots and LLM calls for each step.

pirate avatar Apr 27 '25 01:04 pirate

What about asking the LLM where it wants to position the mouse, when scrolling (= what element to hover), and then actually do something like Playwrite's page.mouse.wheel(...)? https://playwright.dev/docs/input#scrolling

robinmanuelthiel avatar Apr 29 '25 21:04 robinmanuelthiel

I've created a dedicated challenge to evaluate our progress on fixing this here: https://browser-use.github.io/stress-tests/challenge.html#scroll-accept

pirate avatar May 01 '25 01:05 pirate

@pirate my locator is visible when we scroll downn but it is not happening this is the output

INFO     [agent] 🎯 Next goal: Select 'Manage' for cash.
INFO     [agent] 🛠️  Action 1/2: {"scroll_to_text":{"text":"Manage"}}
INFO     [agent] 🛠️  Action 2/2: {"click_element_by_index":{"index":10}}
INFO     [controller] Text 'Manage' not found or not visible on page
ERROR    [agent] ❌ Result failed 1/1 times:
 Error executing action click_element_by_index: Element with index 10 does not exist - retry or use alternative actions
ERROR    [agent] ❌ Stopping due to 1 consecutive failures

I have tried prompts to page scroll , scroll to locator etc nothing works.. any suggestions 🙏

kpmc-anu avatar May 02 '25 19:05 kpmc-anu

@pyoneerC mine is a simple page scroll or scroll to locator which is not working.. There is nothing fancy in the UI just a form which need to be filled

kpmc-anu avatar May 02 '25 22:05 kpmc-anu

@pirate could you pls help me with this as I am blocked.. any suggestions you would want me to try

kpmc-anu avatar May 02 '25 23:05 kpmc-anu

@pirate I took latest main and still doesn't work .. unfortunately I had to switch to 0.1.40 and it works in this version but other thing fails .. 0.1.40 - fails to click on text but works well scrolling page and clicking latest main - able to click text but fails to scroll and click locators So I am in a difficult spot :(

with 0.1.40
] 🧠 Memory: Current: Step 8, Status: success, Retries: 0
INFO     [agent] 🎯 Next goal: Click on 'Manage' for cash
INFO     [agent] 🛠️  Action 1/2: {"click_element":{"index":10}}
INFO     [agent] 🛠️  Action 2/2: {"wait":{"seconds":1}}
INFO     [controller] 🖱️  Clicked button with index 10: Manage
INFO     [controller] 🕒  Waiting for 1 seconds

kpmc-anu avatar May 03 '25 17:05 kpmc-anu

The PR to add support for scrolling a specific element is not merged yet @kpmc-anu: https://github.com/browser-use/browser-use/pull/1553

There were still unfixed lint errors in the PR so it will come in the next release.

pirate avatar May 03 '25 17:05 pirate

The PR to add support for scrolling a specific element is not merged yet @kpmc-anu: #1553

There were still unfixed lint errors in the PR so it will come in the next release.

sounds good .. Thank you .. if I gets merged to main also I can try first..

kpmc-anu avatar May 03 '25 17:05 kpmc-anu

@pirate what is the good approach when a locator resolves to multiple elements. "click on a 'Assign'" is resolving to 5 element .. is there a way I can handle it .. should I create custom action ?

kpmc-anu avatar May 07 '25 19:05 kpmc-anu

The following code will help you scroll down but outside browser-use, it will use playwright instead:

Version: 0.1.47 OS: Ubuntu

Make sure to modify how you read the path of the file on your Mac

from dotenv import load_dotenv
load_dotenv()
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent, Browser, BrowserConfig, BrowserContextConfig, Controller, ActionResult
import asyncio
import os
import time

from playwright.async_api import async_playwright

import pathlib

from browser_use.browser.views import BrowserState
from browser_use.agent.views import AgentOutput

local_file_path = pathlib.Path("<YOUR HTML FILE PATH>").resolve().as_uri()

llm = ChatGoogleGenerativeAI(
    model='gemini-2.5-flash-preview-04-17',
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

controller = Controller()

def base64_to_image(base64_string: str, output_filename: str):
    """Convert base64 string to image."""
    import base64
    import os

    if not os.path.exists(os.path.dirname(output_filename)):
        os.makedirs(os.path.dirname(output_filename))

    img_data = base64.b64decode(base64_string)
    with open(output_filename, "wb") as f:
        f.write(img_data)
    return output_filename

def new_step_callback(state: BrowserState, model_output: AgentOutput, steps: int):
        """capture screenshot."""
        path = f"./screenshots/{steps}.png"
        last_screenshot = state.screenshot

        img_path = base64_to_image(
            base64_string=str(last_screenshot),
            output_filename=path
        )

@controller.action('Ask user to solve captcha')
def solve_captcha(question: str) -> str:
    answer = input(f'\n There is a captcha, can you solve it?\nInput: ')
    return ActionResult(extracted_content=answer)

@controller.action('Ask user for information')
def ask_human(question: str) -> str:
    answer = input(f'\n{question}\nInput: ')
    return ActionResult(extracted_content=answer)

async def main():
    async with async_playwright() as p:
        playwright_browser = await p.chromium.launch(
            headless=False,
            slow_mo=100,
            args=[
                "--remote-debugging-port=9222",
                "--disable-web-security"
            ]
        )
        page = await playwright_browser.new_page()

        await page.goto(local_file_path, timeout=60000)
        await page.wait_for_load_state("networkidle")
        wrapper = page.locator('div.page-container')
        await wrapper.evaluate("el => el.scrollTop = el.scrollHeight")
        await page.wait_for_load_state("networkidle")
        await page.screenshot(path=os.path.join('<YOUR LOCAL DIRECTORY>', "scrolled_down.png"))

        config = BrowserConfig(cdp_url="http://127.0.0.1:9222", keep_alive=True)
        browser_use = Browser(config=config)
        agent = Agent(
            task="Wait for 2 seconds, Describe what you see",
            llm=llm,
            use_vision=True,
            controller=controller,
            browser=browser_use,
            generate_gif=True,
            register_new_step_callback=new_step_callback
        )

        result = await agent.run()
        print(result)
        input('Press Enter to close the browser...')
        await browser.close()

asyncio.run(main())

Image

RalphChobok avatar May 16 '25 12:05 RalphChobok

The following code will help you scroll down but outside browser-use, it will use playwright instead:

Version: 0.1.47 OS: Ubuntu

Make sure to modify how you read the path of the file on your Mac

from dotenv import load_dotenv
load_dotenv()
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent, Browser, BrowserConfig, BrowserContextConfig, Controller, ActionResult
import asyncio
import os
import time

from playwright.async_api import async_playwright

import pathlib

from browser_use.browser.views import BrowserState
from browser_use.agent.views import AgentOutput

local_file_path = pathlib.Path("<YOUR HTML FILE PATH>").resolve().as_uri()

llm = ChatGoogleGenerativeAI(
    model='gemini-2.5-flash-preview-04-17',
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

controller = Controller()

def base64_to_image(base64_string: str, output_filename: str):
    """Convert base64 string to image."""
    import base64
    import os

    if not os.path.exists(os.path.dirname(output_filename)):
        os.makedirs(os.path.dirname(output_filename))

    img_data = base64.b64decode(base64_string)
    with open(output_filename, "wb") as f:
        f.write(img_data)
    return output_filename

def new_step_callback(state: BrowserState, model_output: AgentOutput, steps: int):
        """capture screenshot."""
        path = f"./screenshots/{steps}.png"
        last_screenshot = state.screenshot

        img_path = base64_to_image(
            base64_string=str(last_screenshot),
            output_filename=path
        )

@controller.action('Ask user to solve captcha')
def solve_captcha(question: str) -> str:
    answer = input(f'\n There is a captcha, can you solve it?\nInput: ')
    return ActionResult(extracted_content=answer)

@controller.action('Ask user for information')
def ask_human(question: str) -> str:
    answer = input(f'\n{question}\nInput: ')
    return ActionResult(extracted_content=answer)

async def main():
    async with async_playwright() as p:
        playwright_browser = await p.chromium.launch(
            headless=False,
            slow_mo=100,
            args=[
                "--remote-debugging-port=9222",
                "--disable-web-security"
            ]
        )
        page = await playwright_browser.new_page()

        await page.goto(local_file_path, timeout=60000)
        await page.wait_for_load_state("networkidle")
        wrapper = page.locator('div.page-container')
        await wrapper.evaluate("el => el.scrollTop = el.scrollHeight")
        await page.wait_for_load_state("networkidle")
        await page.screenshot(path=os.path.join('<YOUR LOCAL DIRECTORY>', "scrolled_down.png"))

        config = BrowserConfig(cdp_url="http://127.0.0.1:9222", keep_alive=True)
        browser_use = Browser(config=config)
        agent = Agent(
            task="Wait for 2 seconds, Describe what you see",
            llm=llm,
            use_vision=True,
            controller=controller,
            browser=browser_use,
            generate_gif=True,
            register_new_step_callback=new_step_callback
        )

        result = await agent.run()
        print(result)
        input('Press Enter to close the browser...')
        await browser.close()

asyncio.run(main())

Image

Thanks! I noticed there's a PR in progress for this. I'll wait for it to be merged into the main branch before trying it out.

treblam avatar May 20 '25 15:05 treblam