Curious on the optimization process when integrating multiple LLMs
Hi, I notice that examples in the project contain only one LLM, and users only need to optimize the instructions and inputs for it.
However, I wonder what will happen if we have a more complex system where multiple LLMs interact with each other, like GANs with a generator and a discriminator.
For example, a very simple case is that a LLM is used to generate test case, such as math questions, and another LLM is to solve these questions. Our target could be that to gradually increase the difficulty of the questions to make the solver LLM fail to give the correct answer and improve the capability of the solver LLM to correctly answer hard questions as much as possible. It looks like training GANs, but under the scenarios of LLMs.
For such an example, the variable we need to optimize will include the instructions of both LLMs. But each LLM will be effected by another. When we train the generator in GAN, we can use the gradients from the discriminator. However, can we use the gradients from the another LLMs?
You might want to take a look at the optimization of chameleon https://github.com/zou-group/textgrad/tree/main/examples/notebooks/chameleon
this is an agentic system we describe more in detail in the paper: https://www.nature.com/articles/s41586-025-08661-4
Yes, I have studied this example. But, it is still not very clear what happens during the optimization process. Is the gradient first generated at the leaf node, e.g., the answer generator module in the example, and then passed to its parent node, e.g., the solution generator module? Will the gradients of the solution generator module influence the gradients of the knowledge retrieval module? And how does they make it happen?