browser-use Add Gemini Computer Use Integration

Summary

Adds native support for Gemini's Computer Use API by converting Computer Use function calls into Actor Use calls and returning them back to the Computer Use model. Currently requires creating a child of Agent() to allow for message formatting and function calling loop

Core Files

chat.py - ChatGeminiComputerUse LLM wrapper
- Handles Computer Use tool configuration
- Serializes messages between Browser Use and Gemini formats
- Returns raw responses for function call handling
agent.py - ComputerUseAgent class
- Extends base Agent with Computer Use function calling loop
- Manages multi-turn conversation with screenshot feedback
- Auto-formats results as JSON for structured_output support
bridge.py - ComputerUseBridge
- Converts Gemini function calls to ActionResult objects
- Orchestrates execution via ComputerUseActionExecutor
executor.py - ComputerUseActionExecutor
- Executes Computer Use actions via Actor API
- Handles coordinate denormalization (0-999 → actual pixels)
- Implements all Computer Use functions (click_at, type_text_at, navigate, etc.)
computer_use_system_prompt.md - System prompt template for Computer Use workflow

Usage

from browser_use.llm.gemini_computer_use import ChatGeminiComputerUse, ComputerUseAgent

llm = ChatGeminiComputerUse(
    model='gemini-2.5-computer-use-preview-10-2025',
    api_key=os.getenv('GOOGLE_API_KEY'),
    enable_computer_use=True,
)

agent = ComputerUseAgent(
    task="Find the founders of Browser Use startup",
    llm=llm,
    use_vision=True,
    max_function_iterations= X, 
)

result = await agent.run()
print(result.structured_output.result)

🤖 Generated with Claude Code

Oct 11 '25 20:10 Cheggin

All committers have signed the CLA.

Oct 11 '25 20:10 CLAassistant

This would be great to have, it's a very fast and accurate model.

Oct 30 '25 10:10 Niek

browser-use browser-use copied to clipboard

Add Gemini Computer Use Integration

Summary

Core Files

Usage

browser-use
browser-use copied to clipboard