How can I use Agent Lightning for fine-tuning an agent’s system prompt using OpenAI models (gpt-4o)? Can VERL be used for this?

Open rutvik-jaiswal-deeplearning opened this issue 1 month ago • 1 comments

I am trying to fine-tune an existing Agent using Lightning agent — specifically its system prompt (agent behavior).
My requirements:

I must use OpenAI models only, such as gpt-4o base.
I want to use VERL to optimize or update the agent’s prompt/behavior.
I prefer a minimal, validated, single-file example (or as simple as possible).

Does Agent Lightning support fine-tuning or behavioral optimization using VERL when the underlying LLM is an OpenAI model?
Is VERL compatible with Agent Lightning for updating prompts or performing reward-based optimization on an OpenAI-powered agent?
Can you we have used verl with prompt optimization for the llm and get some a validated minimal example demonstrating how to integrate VERL + Agent Lightning + OpenAI (gpt-4o) for fine-tuning an agent system prompt?

Dec 02 '25 13:12 rutvik-jaiswal-deeplearning

Fine-tuning / behavior optimization or OpenAI model: yes. w/ VERL: no.
VERL is an RL framework for local model. You can't use VERL for prompt tuning, to the best of my knowledge.
It's not possible.

Related references:

Azure OpenAI Fine-tuning (with SFT) with Agent-lightning: https://github.com/microsoft/agent-lightning/tree/main/examples/azure
Introduction to VERL: https://github.com/microsoft/agent-lightning/tree/main/examples/azure
Prompt tuning examples (naturally work with GPT models): https://github.com/microsoft/agent-lightning/tree/main/examples/apo

Dec 02 '25 15:12 ultmaster