browser-use
browser-use copied to clipboard
Scroll down failed when the scrollable element is not root element
Bug Description
using browser use with clash dashboard, it can't scroll down the page, and I've tried to investigated the problem, it seems that if the scrollable element is not the root element but an div within the page, it can't scroll down the page successfully. the scrollable element is in the screenshot below:
Reproduction Steps
1.install browser use 2.run the following task, I am using gemini 2.5 pro model 3.it scroll the page down multiple times but in fact the page didn't scroll at all.
the webpage is in my home network, I can't provide the online url ,so I upload the code in the attachment, you can download it to your computer to reproduce the problem, please make sure to update the page location to corresponding file system location in the code sample.
Code Sample
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent, Browser, BrowserConfig, BrowserContextConfig, Controller, ActionResult
import asyncio
import os
llm = ChatGoogleGenerativeAI(
model='gemini-2.5-pro-preview-03-25',
# api_key=SecretStr(os.getenv('GEMINI_API_KEY')),
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
)
controller = Controller()
@controller.action('Ask user to solve captcha')
def solve_captcha(question: str) -> str:
answer = input(f'\n There is a captcha, can you solve it?\nInput: ')
return ActionResult(extracted_content=answer)
@controller.action('Ask user for information')
def ask_human(question: str) -> str:
answer = input(f'\n{question}\nInput: ')
return ActionResult(extracted_content=answer)
async def main():
agent = Agent(
task="navigate to this page file:///Users/jerry/Downloads/Clash.html, and scroll down the page to find the '测速' button",
llm=llm,
use_vision=True,
controller=controller,
)
result = await agent.run()
print(result)
input('Press Enter to close the browser...')
await browser.close()
asyncio.run(main())
Version
0.1.41
LLM Model
Other (specify in description)
Operating System
macOS 15.3.2
Relevant Log Output
INFO [browser_use] BrowserUse logging setup complete with level info
INFO [agent] 🧠 Starting an agent with main_model=models/gemini-2.5-pro-preview-03-25 +vision, planner_model=None, extraction_model=None
INFO [agent] 🚀 Starting task: navigate to this page file:///Users/jerry/Downloads/Clash.html, and scroll down the page to find the '测速' button
INFO [agent] 📍 Step 1
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745659051.914385 14583372 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
INFO [agent] 🤷 Eval: Unknown - The initial state is a blank page, so no previous goal to evaluate.
INFO [agent] 🧠 Memory: Starting the task. The goal is to navigate to file:///Users/jerry/Downloads/Clash.html and find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Navigate to the specified local HTML file.
INFO [agent] 🛠️ Action 1/1: {"go_to_url":{"url":"file:///Users/jerry/Downloads/Clash.html"}}
INFO [controller] 🔗 Navigated to file:///Users/jerry/Downloads/Clash.html
INFO [agent] 📍 Step 2
INFO [agent] 👍 Eval: Success - Successfully navigated to the specified local HTML file file:///Users/jerry/Downloads/Clash.html.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 3
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page once.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down once. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 4
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page twice.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down twice. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 5
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page three times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down three times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 6
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page four times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down four times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 7
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page five times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down five times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 8
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page six times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down six times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 9
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page seven times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down seven times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 10
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page eight times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down eight times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 11
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page nine times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down nine times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 12
INFO [agent] 👍 Eval: Success - I successfully scrolled down the page ten times.
INFO [agent] 🧠 Memory: Navigated to file:///Users/jerry/Downloads/Clash.html and scrolled down ten times. The task is to find the '测速' button by scrolling down.
INFO [agent] 🎯 Next goal: Scroll down the page again to find the '测速' button.
INFO [agent] 🛠️ Action 1/1: {"scroll_down":{}}
INFO [controller] 🔍 Scrolled down the page by one page
INFO [agent] 📍 Step 13
Yeah inline scrolls are really hard to deal with in our current architecture, we've already been thinking about how to best improve this for some time. Open to ideas.
The real difficulty is the vision model, we cant take screenshots of every possible inline scrollable element in every possible scroll state for the vision model, as it could require many more screenshots and LLM calls for each step.
What about asking the LLM where it wants to position the mouse, when scrolling (= what element to hover), and then actually do something like Playwrite's page.mouse.wheel(...)? https://playwright.dev/docs/input#scrolling
I've created a dedicated challenge to evaluate our progress on fixing this here: https://browser-use.github.io/stress-tests/challenge.html#scroll-accept
@pirate my locator is visible when we scroll downn but it is not happening this is the output
INFO [agent] 🎯 Next goal: Select 'Manage' for cash.
INFO [agent] 🛠️ Action 1/2: {"scroll_to_text":{"text":"Manage"}}
INFO [agent] 🛠️ Action 2/2: {"click_element_by_index":{"index":10}}
INFO [controller] Text 'Manage' not found or not visible on page
ERROR [agent] ❌ Result failed 1/1 times:
Error executing action click_element_by_index: Element with index 10 does not exist - retry or use alternative actions
ERROR [agent] ❌ Stopping due to 1 consecutive failures
I have tried prompts to page scroll , scroll to locator etc nothing works.. any suggestions 🙏
@pyoneerC mine is a simple page scroll or scroll to locator which is not working.. There is nothing fancy in the UI just a form which need to be filled
@pirate could you pls help me with this as I am blocked.. any suggestions you would want me to try
@pirate I took latest main and still doesn't work .. unfortunately I had to switch to 0.1.40 and it works in this version but other thing fails .. 0.1.40 - fails to click on text but works well scrolling page and clicking latest main - able to click text but fails to scroll and click locators So I am in a difficult spot :(
with 0.1.40
] 🧠 Memory: Current: Step 8, Status: success, Retries: 0
INFO [agent] 🎯 Next goal: Click on 'Manage' for cash
INFO [agent] 🛠️ Action 1/2: {"click_element":{"index":10}}
INFO [agent] 🛠️ Action 2/2: {"wait":{"seconds":1}}
INFO [controller] 🖱️ Clicked button with index 10: Manage
INFO [controller] 🕒 Waiting for 1 seconds
The PR to add support for scrolling a specific element is not merged yet @kpmc-anu: https://github.com/browser-use/browser-use/pull/1553
There were still unfixed lint errors in the PR so it will come in the next release.
The PR to add support for scrolling a specific element is not merged yet @kpmc-anu: #1553
There were still unfixed lint errors in the PR so it will come in the next release.
sounds good .. Thank you .. if I gets merged to main also I can try first..
@pirate what is the good approach when a locator resolves to multiple elements. "click on a 'Assign'" is resolving to 5 element .. is there a way I can handle it .. should I create custom action ?
The following code will help you scroll down but outside browser-use, it will use playwright instead:
Version: 0.1.47
OS: Ubuntu
Make sure to modify how you read the path of the file on your Mac
from dotenv import load_dotenv
load_dotenv()
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent, Browser, BrowserConfig, BrowserContextConfig, Controller, ActionResult
import asyncio
import os
import time
from playwright.async_api import async_playwright
import pathlib
from browser_use.browser.views import BrowserState
from browser_use.agent.views import AgentOutput
local_file_path = pathlib.Path("<YOUR HTML FILE PATH>").resolve().as_uri()
llm = ChatGoogleGenerativeAI(
model='gemini-2.5-flash-preview-04-17',
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
)
controller = Controller()
def base64_to_image(base64_string: str, output_filename: str):
"""Convert base64 string to image."""
import base64
import os
if not os.path.exists(os.path.dirname(output_filename)):
os.makedirs(os.path.dirname(output_filename))
img_data = base64.b64decode(base64_string)
with open(output_filename, "wb") as f:
f.write(img_data)
return output_filename
def new_step_callback(state: BrowserState, model_output: AgentOutput, steps: int):
"""capture screenshot."""
path = f"./screenshots/{steps}.png"
last_screenshot = state.screenshot
img_path = base64_to_image(
base64_string=str(last_screenshot),
output_filename=path
)
@controller.action('Ask user to solve captcha')
def solve_captcha(question: str) -> str:
answer = input(f'\n There is a captcha, can you solve it?\nInput: ')
return ActionResult(extracted_content=answer)
@controller.action('Ask user for information')
def ask_human(question: str) -> str:
answer = input(f'\n{question}\nInput: ')
return ActionResult(extracted_content=answer)
async def main():
async with async_playwright() as p:
playwright_browser = await p.chromium.launch(
headless=False,
slow_mo=100,
args=[
"--remote-debugging-port=9222",
"--disable-web-security"
]
)
page = await playwright_browser.new_page()
await page.goto(local_file_path, timeout=60000)
await page.wait_for_load_state("networkidle")
wrapper = page.locator('div.page-container')
await wrapper.evaluate("el => el.scrollTop = el.scrollHeight")
await page.wait_for_load_state("networkidle")
await page.screenshot(path=os.path.join('<YOUR LOCAL DIRECTORY>', "scrolled_down.png"))
config = BrowserConfig(cdp_url="http://127.0.0.1:9222", keep_alive=True)
browser_use = Browser(config=config)
agent = Agent(
task="Wait for 2 seconds, Describe what you see",
llm=llm,
use_vision=True,
controller=controller,
browser=browser_use,
generate_gif=True,
register_new_step_callback=new_step_callback
)
result = await agent.run()
print(result)
input('Press Enter to close the browser...')
await browser.close()
asyncio.run(main())
The following code will help you scroll down but outside browser-use, it will use playwright instead:
Version:
0.1.47OS:UbuntuMake sure to modify how you read the path of the file on your Mac
from dotenv import load_dotenv load_dotenv() from langchain_google_genai import ChatGoogleGenerativeAI from browser_use import Agent, Browser, BrowserConfig, BrowserContextConfig, Controller, ActionResult import asyncio import os import time from playwright.async_api import async_playwright import pathlib from browser_use.browser.views import BrowserState from browser_use.agent.views import AgentOutput local_file_path = pathlib.Path("<YOUR HTML FILE PATH>").resolve().as_uri() llm = ChatGoogleGenerativeAI( model='gemini-2.5-flash-preview-04-17', temperature=0, max_tokens=None, timeout=None, max_retries=2, ) controller = Controller() def base64_to_image(base64_string: str, output_filename: str): """Convert base64 string to image.""" import base64 import os if not os.path.exists(os.path.dirname(output_filename)): os.makedirs(os.path.dirname(output_filename)) img_data = base64.b64decode(base64_string) with open(output_filename, "wb") as f: f.write(img_data) return output_filename def new_step_callback(state: BrowserState, model_output: AgentOutput, steps: int): """capture screenshot.""" path = f"./screenshots/{steps}.png" last_screenshot = state.screenshot img_path = base64_to_image( base64_string=str(last_screenshot), output_filename=path ) @controller.action('Ask user to solve captcha') def solve_captcha(question: str) -> str: answer = input(f'\n There is a captcha, can you solve it?\nInput: ') return ActionResult(extracted_content=answer) @controller.action('Ask user for information') def ask_human(question: str) -> str: answer = input(f'\n{question}\nInput: ') return ActionResult(extracted_content=answer) async def main(): async with async_playwright() as p: playwright_browser = await p.chromium.launch( headless=False, slow_mo=100, args=[ "--remote-debugging-port=9222", "--disable-web-security" ] ) page = await playwright_browser.new_page() await page.goto(local_file_path, timeout=60000) await page.wait_for_load_state("networkidle") wrapper = page.locator('div.page-container') await wrapper.evaluate("el => el.scrollTop = el.scrollHeight") await page.wait_for_load_state("networkidle") await page.screenshot(path=os.path.join('<YOUR LOCAL DIRECTORY>', "scrolled_down.png")) config = BrowserConfig(cdp_url="http://127.0.0.1:9222", keep_alive=True) browser_use = Browser(config=config) agent = Agent( task="Wait for 2 seconds, Describe what you see", llm=llm, use_vision=True, controller=controller, browser=browser_use, generate_gif=True, register_new_step_callback=new_step_callback ) result = await agent.run() print(result) input('Press Enter to close the browser...') await browser.close() asyncio.run(main())
Thanks! I noticed there's a PR in progress for this. I'll wait for it to be merged into the main branch before trying it out.