autogen Roadmap for AutoGen Evaluation

Roadmap for AutoGen Evaluation

Open gagb opened this issue 1 year ago • 1 comments

Considerations

Differentiate between whether AutoGen is being evaluated or using AutoGen to evaluate something else
Help decide whether a PR is helpful/breaks something
Compare value of simpler to complex topologies (e.g, similar to optiguide and group chat)

Roadmap for AutoGen Evaluation

#### Meta Issues
- [ ] Evaluate value of simple vs complex topologies (TOP Priority: when are >2 agents needed?)
- [ ] Shared foundation in the framework on logging, create metrics etc
- [ ] https://github.com/microsoft/autogen/issues/692
- [ ] Develop new AutoGen test suite
- [ ] https://github.com/microsoft/autogen/issues/691
- [ ] Add examples from bubblebench and hugging face dataset into test suite
- [ ] Finalize notebooks and blog post on new metrics
- [ ] Evaluate large multi-modal models

Nov 08 '23 23:11 gagb

Just to make it sure, is the evaluation of Large language modes contains in the roadmap " Evaluate large multi-modal models"? Really interested in how to choose the LLM.

Jan 12 '24 08:01 susu3621

stale

Mar 08 '24 19:03 gagb

autogen autogen copied to clipboard

Roadmap for AutoGen Evaluation

Considerations

Roadmap for AutoGen Evaluation

autogen
autogen copied to clipboard