browser-use icon indicating copy to clipboard operation
browser-use copied to clipboard

Clickable (interactive) elements not being detected.

Open IAmAGoodBoy04 opened this issue 9 months ago • 8 comments

Bug Description

When I try to run the agent for the URL "https://www.ebay.com/sch/i.html?_nkw=cards+against+humanity+christmas+2024", it does not detect anything on the page as interactable after scrolling the page once (no elements get highlighted). So i want it to click on the next page button, but it is not being detected. LLM used: gemini 2.0 flash

Reproduction Steps

Run the agent (The code sample is a very minimal version of my code, but i have tried creating custom controllers for scrolling and used a longer, more specific prompt)

Code Sample

from browser_use import Agent, Browser, Controller, ActionResult, BrowserConfig
from browser_use.browser.context import BrowserContextConfig, BrowserContext
from langchain_google_genai import ChatGoogleGenerativeAI
import sys
import asyncio
import os
from dotenv import load_dotenv
load_dotenv()

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash", 
    api_key=os.getenv("GOOGLE_API_KEY"),
    temperature=0,
)
controller = Controller()
config = BrowserConfig(
    headless=False,
)
context_config = BrowserContextConfig(
    wait_for_network_idle_page_load_time=3,
    browser_window_size={"width": 1400, "height": 850},
    highlight_elements=True,
)

@controller.action("Open URL")
async def open_url(url: str, browser: BrowserContext):
    page = await browser.get_current_page()
    page.set_default_navigation_timeout(0)
    await page.goto(url)
    print("Opening URL")
    await page.wait_for_load_state("domcontentloaded")
    await page.wait_for_timeout(3000)
    print("Opened URL")
    return ActionResult(return_message="Opened URL")

async def main(url):
    browser = Browser(config=config)
    context = BrowserContext(browser=browser,config=context_config)
    agent = Agent(
        task="Scroll to the botom of the page and click on the next page button",
        llm=llm,
        controller=controller,
        use_vision=False,
        generate_gif=False,
        browser_context=context,
        initial_actions=[
            {'open_url': {'url': url, 'browser': context}},
        ],
    )
    result = await agent.run(max_steps=50)
    await browser.close()

if __name__ == "__main__":
    sys.stdout.reconfigure(encoding="utf-8")
    sys.stderr.reconfigure(encoding="utf-8")
    asyncio.run(main("https://www.ebay.com/sch/i.html?_nkw=cards+against+humanity+christmas+2024"))

Version

0.1.40

LLM Model

Other (specify in description)

Operating System

Windows 11

Relevant Log Output


IAmAGoodBoy04 avatar Mar 23 '25 08:03 IAmAGoodBoy04

It works. Would you test the git version?

INFO     [agent] 🚀 Starting task: Scroll to the botom of the page and click on the next page button
	Opening URL
Opened URL
INFO     [agent] 📍 Step 1
INFO     [agent] 🤷 Eval: Unknown - I don't know if the click was successful yet.
INFO     [agent] 🧠 Memory: Starting with the new task. I have completed 1/10 steps
INFO     [agent] 🎯 Next goal: Scroll to the bottom of the page.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 2
INFO     [agent] 🤷 Eval: Unknown - I don't know if the scroll was successful yet.
INFO     [agent] 🧠 Memory: Starting with the new task. I have completed 1/10 steps. I have scrolled down once.
INFO     [agent] 🎯 Next goal: Scroll to the bottom of the page.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 3
INFO     [agent] 🤷 Eval: Unknown - I don't know if the scroll was successful yet.
INFO     [agent] 🧠 Memory: Starting with the new task. I have completed 1/10 steps. I have scrolled down twice.
INFO     [agent] 🎯 Next goal: Scroll to the bottom of the page.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 4
INFO     [agent] 🤷 Eval: Unknown - I don't know if the scroll was successful yet.
INFO     [agent] 🧠 Memory: Starting with the new task. I have completed 1/10 steps. I have scrolled down three times.
INFO     [agent] 🎯 Next goal: Scroll to the bottom of the page.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 5
INFO     [agent] 🤷 Eval: Unknown - I don't know if the scroll was successful yet.
INFO     [agent] 🧠 Memory: Starting with the new task. I have completed 1/10 steps. I have scrolled down four times.
INFO     [agent] 🎯 Next goal: Scroll to the bottom of the page.
INFO     [agent] 🛠️  Action 1/1: {"scroll_down":{}}
INFO     [controller] 🔍  Scrolled down the page by one page
INFO     [agent] 📍 Step 6
INFO     [agent] 🤷 Eval: Unknown - I don't know if the scroll was successful yet.
INFO     [agent] 🧠 Memory: Starting with the new task. I have completed 1/10 steps. I have scrolled down four times.
INFO     [agent] 🎯 Next goal: Click on the next page button.
INFO     [agent] 🛠️  Action 1/1: {"click_element":{"index":284}}
INFO     [controller] 🖱️  Clicked button with index 284: 
INFO     [agent] 📍 Step 7
INFO     [agent] 👍 Eval: Success - I clicked on the next page button.
INFO     [agent] 🧠 Memory: Starting with the new task. I have completed 1/10 steps. I have scrolled down four times.
INFO     [agent] 🎯 Next goal: Complete the task.
INFO     [agent] 🛠️  Action 1/1: {"done":{"text":"I have scrolled to the bottom of the page and clicked on the next page button.","success":true}}
INFO     [agent] 📄 Result: I have scrolled to the bottom of the page and clicked on the next page button.
INFO     [agent] ✅ Task completed
INFO     [agent] ✅ Successfully

SmartManoj avatar Mar 24 '25 14:03 SmartManoj

It works. Would you test the git version?

where can i find that version?

IAmAGoodBoy04 avatar Mar 24 '25 17:03 IAmAGoodBoy04

pip install git+https://github.com/browser-use/browser-use

SmartManoj avatar Mar 25 '25 02:03 SmartManoj

This command didn't work, it just ended up deleting everything

IAmAGoodBoy04 avatar Mar 26 '25 06:03 IAmAGoodBoy04

Did you get any errors when installing?

https://docs.browser-use.com/development/local-setup

SmartManoj avatar Mar 26 '25 06:03 SmartManoj

I did not get any errors, but upon installing thtere was only the buildomtree.js file in the library, nothing else. Is there some way to install it properly using pip only, without using a venv?

IAmAGoodBoy04 avatar Mar 26 '25 07:03 IAmAGoodBoy04

Is the library's location the same as the one found in pip show browser-use?

In local installation, you can skip venv too.

SmartManoj avatar Mar 26 '25 07:03 SmartManoj

yes, i checked in that location only

IAmAGoodBoy04 avatar Mar 26 '25 08:03 IAmAGoodBoy04

this error should be fixed now, there was a build issue briefly. please pull main, uv sync, and try again

pirate avatar Mar 26 '25 18:03 pirate

i cloned the repo into Lib/site-packages and ran pip install . , that worked for me. Thanks for the help.

IAmAGoodBoy04 avatar Mar 26 '25 19:03 IAmAGoodBoy04