browser-use
browser-use copied to clipboard
Add Gemini Computer Use Integration
Summary
Adds native support for Gemini's Computer Use API by converting Computer Use function calls into Actor Use calls and returning them back to the Computer Use model. Currently requires creating a child of Agent() to allow for message formatting and function calling loop
Core Files
-
chat.py-ChatGeminiComputerUseLLM wrapper- Handles Computer Use tool configuration
- Serializes messages between Browser Use and Gemini formats
- Returns raw responses for function call handling
-
agent.py-ComputerUseAgentclass- Extends base
Agentwith Computer Use function calling loop - Manages multi-turn conversation with screenshot feedback
- Auto-formats results as JSON for
structured_outputsupport
- Extends base
-
bridge.py-ComputerUseBridge- Converts Gemini function calls to
ActionResultobjects - Orchestrates execution via
ComputerUseActionExecutor
- Converts Gemini function calls to
-
executor.py-ComputerUseActionExecutor- Executes Computer Use actions via Actor API
- Handles coordinate denormalization (0-999 → actual pixels)
- Implements all Computer Use functions (click_at, type_text_at, navigate, etc.)
-
computer_use_system_prompt.md- System prompt template for Computer Use workflow
Usage
from browser_use.llm.gemini_computer_use import ChatGeminiComputerUse, ComputerUseAgent
llm = ChatGeminiComputerUse(
model='gemini-2.5-computer-use-preview-10-2025',
api_key=os.getenv('GOOGLE_API_KEY'),
enable_computer_use=True,
)
agent = ComputerUseAgent(
task="Find the founders of Browser Use startup",
llm=llm,
use_vision=True,
max_function_iterations= X,
)
result = await agent.run()
print(result.structured_output.result)
🤖 Generated with Claude Code
This would be great to have, it's a very fast and accurate model.