autogen
autogen copied to clipboard
Roadmap for AutoGen Evaluation
Considerations
- Differentiate between whether AutoGen is being evaluated or using AutoGen to evaluate something else
- Help decide whether a PR is helpful/breaks something
- Compare value of simpler to complex topologies (e.g, similar to optiguide and group chat)
Roadmap for AutoGen Evaluation
#### Meta Issues
- [ ] Evaluate value of simple vs complex topologies (TOP Priority: when are >2 agents needed?)
- [ ] Shared foundation in the framework on logging, create metrics etc
- [ ] https://github.com/microsoft/autogen/issues/692
- [ ] Develop new AutoGen test suite
- [ ] https://github.com/microsoft/autogen/issues/691
- [ ] Add examples from bubblebench and hugging face dataset into test suite
- [ ] Finalize notebooks and blog post on new metrics
- [ ] Evaluate large multi-modal models
Just to make it sure, is the evaluation of Large language modes contains in the roadmap " Evaluate large multi-modal models"? Really interested in how to choose the LLM.
stale