Trace Roadmap: Onboarding optimizers to Trace and creating a benchmark for generative optimizers

Trace provides a framework to program agent architectures (parameterized by code, prompts, etc.) that can be trained by generative optimizers that can optimize graphs. There're many LLM-based generative optimization algorithms and agent optimization algorithms proposed in the literature. In principle, many are compatible with the Trace setting since they can be extended to go beyond their original goal (of optimizing texts) and work on graph directly. If we can have reliable implementation of these optimizers in Trace, then we can

Fairly compare their performance for research purpose. This addresses the issues that many experimental results in the literature are not directly comparable from an optimization algorithm's perspective, since there're differences in agents and prompts. This will help new research in generative optimization make progress faster and help its reproducibility.
Provide a suite of readily useable tools for practitioners. If multiple optimizers can be used interchangeably, a system developer can quickly experiment with different techniques to improve the system. This would lower the barrier of using generative optimization techniques. Currently, except for using Trace, switching algorithms means switching frameworks.

To achieve this goal, we need

Reliable implementation of generative optimization algorithms. Currently we have 3 in Trace. They can be made more reliable and we can increase the options.
Benchmark to test generative optimization algorithms. This arises as a necessary mean to onboard and debug new optimizers. We can start by repurposing the existing datasets that have been used in the literature and create evaluation of learning agents from them. The creation of this benchmark will help understand the performance of different optimization algorithms in the literature and help the process of developing new ones.

Next steps:

Create a list of algorithms to be implemented.
Create a list of datasets to be used as tests.

Dec 30 '24 23:12 chinganc

I’d be interested in contributing to optimizers with human feedback. My main considerations are the API cost associated with experimenting on datasets and the time required to receive feedback from your side to ensure alignment with my work.

Jan 22 '25 10:01 doxav

@doxav That's very helpful. We're in the process of preparing for this effort. We will let you know once we have more concrete items. About the questions you mentioned. 1. We will provide resources for benchmarking on a larger scale e.g. by running those experiments on our end when needed and then reporting back the results. 2. Once starting, we should create a more real-time communication channel (like Slack e.g.) so that we can move more efficiently.

Jan 22 '25 17:01 chinganc

Do you expect to do a first iteration in February ? I will work on optimization subjects this month so I need to know if this can be part of my work plan. Thanks

Jan 28 '25 15:01 doxav

Hi @doxav, we have a concrete plan now. We would like to invite you. Let's find some time to chat. I don't see your email on github. Maybe you can send me an email at [email protected] and we can find a time.

Jan 30 '25 00:01 chinganc