feat:web toolkit with stagehand (#1406)
Description
Creates a web toolkit featuring a tool built with the Stagehand library, enabling the ChatAgent to interact with webpages.
Motivation and Context
A toolkit that can achieve a certain degree of webpage (rendered) interaction, performs web-based tasks. (e.g. click elements and scrolling pages, open a given url, make a screenshot, use MLLM to understand the webpage content)
If it fixes an open issue, please link to the issue here.
close #1406
- [x] I have raised an issue to propose this change (https://github.com/camel-ai/camel/issues/1406) for the web toolkit.
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds core functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation (update in the documentation)
- [ ] Example (update in the folder of example)
Implemented Tasks
- [x] created web toolkit for the ChatAgent
- [x] created a tool with Stagehand library in the web toolkit
Checklist
- [x] I have read the CONTRIBUTION guide. (required)
- [ ] My change requires a change to the documentation.
- [ ] I have updated the tests accordingly. (required for a bug fix or a new feature)
- [ ] I have updated the documentation accordingly.
thanks @X-TRON404 ,
2025-02-09 14:16:42,394 - camel.agents.chat_agent - INFO - Model gpt-4o-mini, index 0, processed these messages: [{'role': 'system', 'content': 'You are a helpful assistant capable of performing web \ninteractions and answering questions with real-time data. When appropriate, \nuse the available \ntools to automate web tasks and retrieve information.\n'}, {'role': 'user', 'content': 'What is the most visited website in the world?'}, {'role': 'assistant', 'content': '', 'tool_calls': [{'id': 'call_6fSh6xakJHQ2pW9ApOkcArLd', 'type': 'function', 'function': {'name': 'stagehand_tool', 'arguments': '{"task_prompt": "Find the most visited website in the world as of now."}'}}]}, {'role': 'tool', 'content': '""', 'tool_call_id': 'call_6fSh6xakJHQ2pW9ApOkcArLd'}] As of now, the most visited website in the world is Google. It consistently ranks at the top due to its search engine services, along with other platforms like YouTube and Facebook following closely behind.I tried to run the example code, but from the log info seems the web_toolkit was not called as expected, the content from the tool call is
"", could you check this?this PR still missing unit test
I derived some unit tests from GAIA. Will upload later
zh:我可以试图理解一下场景嘛: roleplaying 在一开始拆分任务的时候是一步步拆分的,会把一个大的任务拆分成几个小任务。假设有一个任务它主要是浏览器的操作,拆分成3个子任务,任务1-登录某系统、找到某菜单, 任务2-通过条件查询数据,获取详情 3-剩余操作。任务1、任务2 都需要用到浏览器,任务2应该复用任务1的浏览器会话。核心问题就是"如何在多个tool_call之间保持浏览器会话"。当然,也可以修改提示词,但是通用性就差了一点
en: Can I try to understand the scenario: In roleplaying, when initially breaking down tasks, it's done step by step, splitting a large task into several smaller ones. Suppose there's a task mainly involving browser operations, divided into three subtasks: Task 1 - log into a system and find a specific menu; Task 2 - query data based on conditions and retrieve details; Task 3 - remaining operations. Both Task 1 and Task 2 require the browser, and Task 2 should reuse the browser session from Task 1. The core issue is "how to maintain the browser session across multiple tool_calls." Of course, the prompt could be modified, but that would reduce generality.