autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Roadmap for AutoGen Evaluation

Open gagb opened this issue 1 year ago • 1 comments

Considerations

  • Differentiate between whether AutoGen is being evaluated or using AutoGen to evaluate something else
  • Help decide whether a PR is helpful/breaks something
  • Compare value of simpler to complex topologies (e.g, similar to optiguide and group chat)

Roadmap for AutoGen Evaluation

#### Meta Issues
- [ ] Evaluate value of simple vs complex topologies (TOP Priority: when are >2 agents needed?)
- [ ] Shared foundation in the framework on logging, create metrics etc
- [ ] https://github.com/microsoft/autogen/issues/692
- [ ] Develop new AutoGen test suite
- [ ] https://github.com/microsoft/autogen/issues/691
- [ ] Add examples from bubblebench and hugging face dataset into test suite
- [ ] Finalize notebooks and blog post on new metrics
- [ ] Evaluate large multi-modal models

gagb avatar Nov 08 '23 23:11 gagb

Just to make it sure, is the evaluation of Large language modes contains in the roadmap " Evaluate large multi-modal models"? Really interested in how to choose the LLM.

susu3621 avatar Jan 12 '24 08:01 susu3621

stale

gagb avatar Mar 08 '24 19:03 gagb