agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

How can I use Agent Lightning for fine-tuning an agent’s system prompt using OpenAI models (gpt-4o)? Can VERL be used for this?

Open rutvik-jaiswal-deeplearning opened this issue 1 month ago • 1 comments

I am trying to fine-tune an existing Agent using Lightning agent — specifically its system prompt (agent behavior).
My requirements:

  • I must use OpenAI models only, such as gpt-4o base.
  • I want to use VERL to optimize or update the agent’s prompt/behavior.
  • I prefer a minimal, validated, single-file example (or as simple as possible).

My Questions:

  1. Does Agent Lightning support fine-tuning or behavioral optimization using VERL when the underlying LLM is an OpenAI model?
  2. Is VERL compatible with Agent Lightning for updating prompts or performing reward-based optimization on an OpenAI-powered agent?
  3. Can you we have used verl with prompt optimization for the llm and get some a validated minimal example demonstrating how to integrate VERL + Agent Lightning + OpenAI (gpt-4o) for fine-tuning an agent system prompt?

  1. Fine-tuning / behavior optimization or OpenAI model: yes. w/ VERL: no.
  2. VERL is an RL framework for local model. You can't use VERL for prompt tuning, to the best of my knowledge.
  3. It's not possible.

Related references:

  • Azure OpenAI Fine-tuning (with SFT) with Agent-lightning: https://github.com/microsoft/agent-lightning/tree/main/examples/azure
  • Introduction to VERL: https://github.com/microsoft/agent-lightning/tree/main/examples/azure
  • Prompt tuning examples (naturally work with GPT models): https://github.com/microsoft/agent-lightning/tree/main/examples/apo

ultmaster avatar Dec 02 '25 15:12 ultmaster