autogen
autogen copied to clipboard
[Bug]: No module named 'testbed_utils' (GAIA MultiAgent v0.1)
Describe the bug
Hello, I'm trying to get the recently published MultiAgent workflow to run, but the readme is sparse, and I'm having issues importing some module named 'testbed_utils', which I presume to be some custom module.
When running scenario.py
, iI get this error back:
ModuleNotFoundError: No module named 'testbed_utils'
I have installed the requirements included but this module is not listed.
Steps to reproduce
- clone Multi Agent workflow (see link in desc)
- create venv with
python==3.10
(i'm using conda) - install requirements.txt
- run
python scenario.py
Model Used
No response
Expected Behavior
Agents workflow should be run (based on prompt.txt
)
Screenshots and logs
No response
Additional Information
Windows 11
pyautogen==0.2.17
Python 3.10.13
Thanks,
So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:
git clone [email protected]:microsoft/autogen.git
git checkout gaia_multiagent_v01_march_1st
cd autogen/samples/tools/autogenbench
install -e .
cd scenarios/GAIA
python Scripts/init_tasks.py
autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl
Line 1-2 clone the repo and select the right branch Line 3-4 navigates to autogenbench and installs it lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format) Line 7 runs the benchmark.
AutoGenBench requires Docker to be running.
Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py
Or, just delete reference to it -- testbed utils isn't needed for anything except logging.
Finally, you will need to run this in an appropriate environment. The Dockerfile we use is: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile
If you don't use Docker, then install the following requirements:
pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic
And install the following command line tools
sudo apt-get install ffmpeg exiftool
I should probably add all this to the readme :)
Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.
Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:
[
{
"model": "gpt-4-turbo-preview",
"api_key": "REDACTED",
"tags": ["llm"],
"organization": "REDACTED",
"max_retries": 65535
},
{
"model": "gpt-4-vision-preview",
"api_key": "REDACTED",
"tags": ["mlm"],
"max_tokens": 1000,
"organization": "REDACTED",
"max_retries": 65535
}
]
Note the vision model has tag "mlm", and the language model has tag "llm".
You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"
Let me just echo this point again: Once we've had a chance to do some ablation studies, we will simplify the installation process, and likely merge some components into main for general use. At present, this is all still very experimental.
Thank you for the thorough walkthrough!
It's not a problem that it's experimental, I'm just looking into this for research purposes, not production. Still looking forward to your refined version, once that's ready!
I'm going to try the steps you provided for general use and come back to you with any further issues.
As i run install -e .
in the specified directory, I'm prompted to specifiy the location of a zip file named nvm-windows
:
(gaiaagent) C:\Users\sbene\Projects\AutoGen-tests\gaia_multiagent_v1\autogen\samples\tools\autogenbench>install -e .
Enter the absolute path where the nvm-windows zip file is extracted/copied to:
Does this refer to a version nvm-windows found here?
EDIT:
I guess this means pip install -e .
instead, which worked.
I'm facing an issue where the web_surfer agent seems unable to perform queries to bing web search. It claims "I'm going to search for X now ...", but when the orchestrator asks for a summary, it says it didn't perform a query yet, and this happens in a loop.
I'm sure my API key isn't the problem, I've verified that. Any ideas on what could be problem here?
...
{
"content": "Start by researching hallucination detection methods for Large Language Models. Look for academic papers, articles, and books about artificial intelligence and machine learning that discuss this topic. Summarize the findings and provide the sources for reference.",
"name": "orchestrator",
"role": "assistant"
},
{
"content": "Sure, I will start by searching for academic papers and articles on \"hallucination detection methods for Large Language Models\".",
"name": "web_surfer",
"role": "assistant"
},
{
"content": "Please provide a summary of your findings on hallucination detection methods for Large Language Models, along with the sources you used for your research.",
"name": "orchestrator",
"role": "assistant"
},
{
"content": "I apologize for the confusion, but I need to perform the search first. Let's start by searching for \"hallucination detection methods for Large Language Models\" in academic databases and scholarly articles.",
"name": "web_surfer",
"role": "assistant"
},
{
"content": "Please provide a summary of your findings on hallucination detection methods for Large Language Models, along with the sources you used for your research.",
"name": "orchestrator",
"role": "assistant"
},
{
"content": "I apologize for the confusion earlier. Let's start by searching for \"hallucination detection methods for Large Language Models\" in academic databases and scholarly articles.",
"name": "web_surfer",
"role": "assistant"
},
...
That is weird. Can you share the console_log. It has more info.
I am stuck:
ModuleNotFoundError: No module named 'autogen.mdconvert'
I am stuck:
ModuleNotFoundError: No module named 'autogen.mdconvert'
mdconvert is a file that only exists in the gaia branch. Can you please provide more details about how you've set things up?
I just duplicate the file into scope. Now it all works.
I can see that the code is very oriented to run the tasks in that jsonl format. I am going to get rid of the "FILE_NAME" thingy. Is there any considerations to auto-select witch template (BasicTwoAgents
,BasicTwoAgentsFunctionCalling
,Orchestrator
,SocietyOfMind
) is most appropriate for what task? When running the GAIA benchmark, did you pick the best result out of each template? how was this handled? How much did it cost to run the whole thing 😵💫 ?
so many questions haha perhaps I can catch you in the next Discord Autogen call :P
Let me just echo this point again: Once we've had a chance to do some ablation studies, we will simplify the installation process, and likely merge some components into main for general use. At present, this is all still very experimental.
Waiting eagerly
Thanks,
So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:
git clone [email protected]:microsoft/autogen.git git checkout gaia_multiagent_v01_march_1st cd autogen/samples/tools/autogenbench install -e . cd scenarios/GAIA python Scripts/init_tasks.py autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl
Line 1-2 clone the repo and select the right branch Line 3-4 navigates to autogenbench and installs it lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format) Line 7 runs the benchmark.
AutoGenBench requires Docker to be running.
Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py
Or, just delete reference to it -- testbed utils isn't needed for anything except logging.
Finally, you will need to run this in an appropriate environment. The Dockerfile we use is: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile
If you don't use Docker, then install the following requirements:
pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic
And install the following command line tools
sudo apt-get install ffmpeg exiftool
I should probably add all this to the readme :)
Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.
Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:
[ { "model": "gpt-4-turbo-preview", "api_key": "REDACTED", "tags": ["llm"], "organization": "REDACTED", "max_retries": 65535 }, { "model": "gpt-4-vision-preview", "api_key": "REDACTED", "tags": ["mlm"], "max_tokens": 1000, "organization": "REDACTED", "max_retries": 65535 } ]
Note the vision model has tag "mlm", and the language model has tag "llm".
You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"
@afourney What should I do next? I was guessing running the collate_results.py
under Scripts
, but it couldn't capture the answers:
(base) zhanwen@mini:~/.../scenarios/GAIA$ python Scripts/collate_results.py Results
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "27d5d136-8563-469e-92bf-fd103c28b57c",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "ec09fa32-d03f-4bf8-84b0-1f16922c3ae4",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "2d83110e-a098-4ebb-9987-066c06fa42d0",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "42576abe-0deb-4869-8c63-225c2d75a95a",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "cca530fc-4052-43b2-b130-b30968d8aa44",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "72e110e7-464c-453c-a309-90a95aed6538",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "8e867cd7-cff9-4e6c-867a-ff5ddc2550be",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "46719c30-f4c3-4cad-be07-d5cb21eee6bb",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "4b6bb5f7-f634-410e-815d-e673ab7f8632",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "dc28cf18-6431-458b-83ef-64b3ce566c10",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "5d0080cb-90d7-4712-bc33-848150e917d3",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "b415aba4-4b68-4fc6-9b89-2c812e55a3e1",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "e1fc63a2-da7a-432f-be78-7c4a95598703",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "a1e91b78-d3d8-4675-bb8d-62741b4b68a6",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "5cfb274c-0207-4aa7-9575-6ac0bd95d9b2",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
{
"task_id": "gaia_validation_level_1__Orchestrator",
"trial": "b816bfce-3d80-4913-a07d-69b752ce6377",
"question": null,
"is_correct": false,
"model_answer": "",
"expected_answer": "NOT PROVIDED !#!#",
"reasoning_trace": []
}
Thanks,
So that's our GAIA entry, and it expects to run in the AutoGenBench environment. From memory, I think the fastest way to get up and running is:
git clone [email protected]:microsoft/autogen.git git checkout gaia_multiagent_v01_march_1st cd autogen/samples/tools/autogenbench install -e . cd scenarios/GAIA python Scripts/init_tasks.py autogenbench run Tasks/gaia_validation_level_1__Orchestrator.jsonl
Line 1-2 clone the repo and select the right branch Line 3-4 navigates to autogenbench and installs it lines 5-6 navigate to the GAIA scenario and initialize the tasks (download them from Huggingface and convert to our format) Line 7 runs the benchmark.
AutoGenBench requires Docker to be running.
Now, if you are just trying to use this agent configuration for some other problem, and have no interest in running GAIA, I advise you to still do steps 1-2 (because the template depends on a version of autogen that diverges from main), and install it. Then you can find the testbed_utils file here: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/template/testbed_utils.py
Or, just delete reference to it -- testbed utils isn't needed for anything except logging.
Finally, you will need to run this in an appropriate environment. The Dockerfile we use is: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/res/Dockerfile
If you don't use Docker, then install the following requirements:
pip install python-docx pdfminer.six requests pillow easyocr python-pptx SpeechRecognition pandas openpyxl pydub mammoth puremagic youtube_transcript_api==0.6.0 mammoth puremagic
And install the following command line tools
sudo apt-get install ffmpeg exiftool
I should probably add all this to the readme :)
Our plan is to do some ablation studies, and once we know which components are truly necessary for this entry, we can simplify all this and ship a more turnkey solution.
Oh, and I nearly forgot! Your OAI_CONFIG_LIST should look like this:
[ { "model": "gpt-4-turbo-preview", "api_key": "REDACTED", "tags": ["llm"], "organization": "REDACTED", "max_retries": 65535 }, { "model": "gpt-4-vision-preview", "api_key": "REDACTED", "tags": ["mlm"], "max_tokens": 1000, "organization": "REDACTED", "max_retries": 65535 } ]
Note the vision model has tag "mlm", and the language model has tag "llm".
You will also need a Bing API key for web surfer. It reads this key from the OS environment variable "BING_API_KEY"
Hi @afourney ! I followed these steps and created a docker image from the docker file, but stuck at this line: https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/autogenbench/run_cmd.py#L487 with the following issues:
===================================================================
sh: 0: cannot open run.sh: No such file
Running scenario Results/gaia_validation_level_1__Orchestrator/5a0c1adf-205e-4841-a666-7c3ef95def9d/0
Mounting:
[RW] 'Results/gaia_validation_level_1__Orchestrator/5a0c1adf-205e-4841-a666-7c3ef95def9d/0' => '/workspace'
[RW] '/data/xliu127/projects/security/vulnerability/nday/autogen' => '/autogen'
===================================================================
Any chance you know how to solve this? Could you possibly upload your docker image? Thanks in advance 🙏