agent-lightning Based on the rag project, there are several questions

The model file produced after training is in FSDP format (designed for large-scale distributed training and saved by PyTorch FSDP after training), which is different from HuggingFace (a universal model distribution format suitable for inference, fine-tuning, and research). How can I convert FSDP to the common inference format for the agent to use?
In the post-training demo of this project, the agent is created by myself. How to integrate letta's agent service or the deployed agent service？

Sep 30 '25 04:09 LY-today

For your first question, I think you can find a bunch of references online. For example, this one: https://github.com/volcengine/verl/issues/1438

For the second question, I'm not familiar with letta's agent service. If the letta's agent service supports dumping traces, it's possible that we convert the traces into trainable data. Our team have been investigating into a similar case on Azure-hosted agents, and will provide a sample code in future.

Sep 30 '25 09:09 ultmaster

@ultmaster Thank you for your reply. I have another question: How to verify that the model after reinforcement learning is more accurate? Is there a standard testing scheme to intuitively see the difference? Is there a reference link?

Sep 30 '25 14:09 LY-today

@ultmaster Regarding question 2, I provide a letta client code that calls the agent service to obtain inference results. Therefore, I wonder if the RAG case in the AL project can directly call the letta agent service to obtain inference results and then train it with the training set. This way, we can train the real online agent, instead of creating a fake agent in the code like in the Rag case.

from letta_client import Letta

def connect_to_letta():
    """连接到Letta服务器"""
    try:
        # 方式1: 连接到本地服务器
        client = Letta(base_url="http://127.0.0.1:8283")
        
        # 方式2: 连接到Letta Cloud（如果使用云服务）
        # client = Letta(
        #     base_url="http://xxx:xxx",
        #     token="xxxx",
        #     project="default-project",
        # )
        
        print(f"客户端连接成功: {client}")
        return client
    except Exception as e:
        print(f"连接失败: {e}")
        return None

def select_agent(client):
    """让用户选择一个agent"""
    try:
        agents = client.agents.list()
        
        if not agents:
            print("没有找到任何agent，请先创建一个agent")
            return None
        
        print(f"\n找到 {len(agents)} 个agent:")
        for i, agent in enumerate(agents, 1):
            print(f"{i}. Agent ID: {agent.id}")
            if hasattr(agent, 'name') and agent.name:
                print(f"   名称: {agent.name}")
            if hasattr(agent, 'description') and agent.description:
                print(f"   描述: {agent.description}")
            print()
        
        while True:
            try:
                choice = input(f"请选择agent (1-{len(agents)}): ").strip()
                if choice.lower() == 'q':
                    return None
                
                choice_num = int(choice)
                if 1 <= choice_num <= len(agents):
                    selected_agent = agents[choice_num - 1]
                    print(f"\n已选择agent: {selected_agent.id}")
                    return selected_agent
                else:
                    print(f"请输入1到{len(agents)}之间的数字")
            except ValueError:
                print("请输入有效的数字")
            except KeyboardInterrupt:
                print("\n退出选择")
                return None
                
    except Exception as e:
        print(f"获取agent列表时发生错误: {e}")
        return None

def chat_with_agent(client, agent):
    """与选定的agent进行持续对话"""
    print(f"\n开始与 Agent {agent.id} 对话")
    print("输入 'quit' 或 'q' 退出对话")
    print("输入 'switch' 切换到其他agent")
    print("-" * 50)
    
    while True:
        try:
            question = input("\n你: ").strip()
            
            if question.lower() in ['quit', 'q', '退出']:
                print("结束对话")
                break
            elif question.lower() in ['switch', '切换']:
                return 'switch'
            elif not question:
                print("请输入问题")
                continue
            
            print("Agent正在思考...")
            
            response = client.agents.messages.create(
                agent_id=agent.id,
                messages=[
                    {
                        "role": "user",
                        "content": question
                    }
                ]
            )
            
            print(f"\nAgent: ")
            for message in response.messages:
                if hasattr(message, 'content') and message.content:
                    print(f"{message.content}")
                    
        except KeyboardInterrupt:
            print("\n\n对话被中断")
            break
        except Exception as e:
            print(f"\n发送消息时发生错误: {e}")
            print("请重试或输入 'quit' 退出")
    
    return 'quit'

def main():
    # 连接到Letta服务器
    client = connect_to_letta()
    if not client:
        return
    
    print("欢迎使用Letta Agent对话系统!")
    
    while True:
        try:
            # 让用户选择agent
            selected_agent = select_agent(client)
            if not selected_agent:
                print("未选择agent，退出程序")
                break
            
            # 与选定的agent对话
            result = chat_with_agent(client, selected_agent)
            
            if result == 'quit':
                break
            elif result == 'switch':
                continue  # 重新选择agent
                
        except KeyboardInterrupt:
            print("\n\n程序被中断，退出")
            break
        except Exception as e:
            print(f"程序发生错误: {e}")
            print("请检查:")
            print("1. Letta服务器是否正在运行")
            print("2. 服务器地址是否正确")
            print("3. 网络连接是否正常")
            print("4. 是否已安装letta-client: pip install letta-client")
            print("\n如果使用本地服务器，请确保已运行: letta server")
            break

if __name__ == "__main__":
    main()

Sep 30 '25 15:09 LY-today

I have another question: How to verify that the model after reinforcement learning is more accurate? Is there a standard testing scheme to intuitively see the difference? Is there a reference link?

I think it's a VERL issue. I suggest seeking help from their community.

Regarding question 2, I provide a letta client code that calls the agent service to obtain inference results. Therefore, I wonder if the RAG case in the AL project can directly call the letta agent service to obtain inference results and then train it with the training set. This way, we can train the real online agent, instead of creating a fake agent in the code like in the Rag case.

I see this line of code here:

            response = client.agents.messages.create(
                agent_id=agent.id,
                messages=[
                    {
                        "role": "user",
                        "content": question
                    }
                ]
            )

It seems that the agent is run on their side and we don't have access to the large language model there. In this way, it may be a bit difficult to get what you want. You might need to dig deeper for additional features that the platform has provided.

Oct 01 '25 03:10 ultmaster

I have another question: How to verify that the model after reinforcement learning is more accurate? Is there a standard testing scheme to intuitively see the difference? Is there a reference link?

I think it's a VERL issue. I suggest seeking help from their community.

Regarding question 2, I provide a letta client code that calls the agent service to obtain inference results. Therefore, I wonder if the RAG case in the AL project can directly call the letta agent service to obtain inference results and then train it with the training set. This way, we can train the real online agent, instead of creating a fake agent in the code like in the Rag case.

I see this line of code here:
        response = client.agents.messages.create(
            agent_id=agent.id,
            messages=[
                {
                    "role": "user",
                    "content": question
                }
            ]
        )
It seems that the agent is run on their side and we don't have access to the large language model there. In this way, it may be a bit difficult to get what you want. You might need to dig deeper for additional features that the platform has provided.

@ultmaster Maybe you didn't understand what I meant. Here I replace the code in rag_agent.py with the following code. Does it mean that I can train an external agent that is already running? If this is the wrong way, then how can I train a real running agent instead of writing an agent instance in its code like rag_agent.py?

response = client.agents.messages.create(
                agent_id=agent.id,
                messages=[
                    {
                        "role": "user",
                        "content": question
                    }
                ]
            )

Oct 01 '25 03:10 LY-today

@ultmaster Is rag_agent.py a best practice? Or is it a demo? If it is a best practice, then I understand that the external agent needs to be re-initialized according to the format in rag_agent.py. If it is just a demo, then I understand that rag_agent.py can call the external agent service API to train the external agent.

Oct 01 '25 04:10 LY-today

Maybe you didn't understand what I meant. Here I replace the code in rag_agent.py with the following code. Does it mean that I can train an external agent that is already running? If this is the wrong way, then how can I train a real running agent instead of writing an agent instance in its code like rag_agent.py?

The problem is with the "external" agent. Do we have the access to the underlying LLM and the underlying real prompt-response pairs for tuning? By the term "agent service API", I guess you don't have the access? In that case, you might need to seek help to the agent service provider for getting the trace data.

Oct 01 '25 05:10 ultmaster

Maybe you didn't understand what I meant. Here I replace the code in rag_agent.py with the following code. Does it mean that I can train an external agent that is already running? If this is the wrong way, then how can I train a real running agent instead of writing an agent instance in its code like rag_agent.py?

The problem is with the "external" agent. Do we have the access to the underlying LLM and the underlying real prompt-response pairs for tuning? By the term "agent service API", I guess you don't have the access? In that case, you might need to seek help to the agent service provider for getting the trace data.

@ultmaster agent=LLM+Tools, which means that we can’t just provide the agent service itself, but agent lightning also needs to sense or call the LLM inside it separately?

Oct 01 '25 15:10 LY-today

Yes. That's basically how RL works. You need to find out the tokens fed into the LLM and generated by the LLM.

You can however use prompt tuning from here: https://microsoft.github.io/agent-lightning/stable/quickstart/getting-started/ That should work without the raw prompt and responses.

Oct 01 '25 16:10 ultmaster

@ultmaster 不太清楚您是否是国人，为了方便您理解我的问题，我以中文表述一下问题2，letta是一个agent管理平台，让用户更加方便的创建agent和集成mcp工具，letta_client 是letta提供的api，让用户可以以代码的形式调用某个我们自己创建的agent服务进行自然语言推理，现在我想对这个letta agent 服务进行强化学习，按上文的沟通，是否 agent-lightning 只用letta_client 调用 agent服务进行推理结果获取和测试集对比评分是不够的？必须感知LLM，这里您提到的LLM是类似vllm部署后的llm服务，还是llm背后的模型文件？然后您提到的【提示-响应对】，这里的响应是LLM服务的还是agent的还是tools的？

Oct 01 '25 16:10 LY-today

agent = Agent(
                model=LitellmModel(model="hosted_vllm/" + llm.model, base_url=llm.endpoint),  # 指定底层LLM模型和API地址
                model_settings=ModelSettings(
                    max_tokens=512,      # 减少最大token数，避免超出上下文限制
                    temperature=0.7,     # 采样温度，控制生成多样性
                    stop=["</answer>"],
                ),
                name="Assistant",        # Agent名称
                instructions=agent_prompt,  # 系统提示词，指导Agent行为
                mcp_servers=[server],    # 绑定WebQA检索MCP服务
            )

@ultmaster 看样子如果我想强化学习letta的agent，就需要把letta的agent按上面的格式进行一下重构，首先模型文件用vllm部署成llm服务，提供endpoint，然后单独提供mcp_servers，是吗？

Oct 01 '25 16:10 LY-today

首先你需要弄清楚 letta 能不能支持接入一个自定义的自己部署的 vllm。RL 的核心是要搞到 prompt-response pairs，response 指的是 LLM 生成的 tokens。如果是自己部署的 vllm，就都还有希望，兜底你可以 hack vllm 把 token 拿出来。如果不是自己部署的，那就要看 letta 有没有提供在线的可以追踪 token 记录的服务了。

看样子如果我想强化学习letta的agent，就需要把letta的agent按上面的格式进行一下重构，首先模型文件用vllm部署成llm服务，提供endpoint，然后单独提供mcp_servers，是吗？

是这样子。核心还是在于能不能用自己的 vllm。

可以加首页 discord 群。

Oct 01 '25 17:10 ultmaster

@

首先你需要弄清楚 letta 能不能支持接入一个自定义的自己部署的 vllm。RL 的核心是要搞到 prompt-response pairs，response 指的是 LLM 生成的 tokens。如果是自己部署的 vllm，就都还有希望，兜底你可以 hack vllm 把 token 拿出来。如果不是自己部署的，那就要看 letta 有没有提供在线的可以追踪 token 记录的服务了。

看样子如果我想强化学习letta的agent，就需要把letta的agent按上面的格式进行一下重构，首先模型文件用vllm部署成llm服务，提供endpoint，然后单独提供mcp_servers，是吗？

是这样子。核心还是在于能不能用自己的 vllm。

可以加首页 discord 群。

@ultmaster 非常感谢您的耐心讲解，我会入群并调研一下letta是否支持自定义llm。另外一个问题，关于后训练的意义：是否可以理解为，比如RAG后训练前向量数据库中有一条数据是问题111，回答222，后训练后产生的新模型部署成llm服务，可以在不依赖问答检索(向量数据库)的情况下，问他111，他自己自动也回复222了？也就是模型训练充分、收敛效果好，那么可以直接回答问题，跳过RAG阶段，如果模型训练不足、效果比较差，这时就还需要RAG来增强上下文

Oct 01 '25 23:10 LY-today

这个可能需要问问做 RAG 的同事，我的理解是说模型学会的不一定是 RAG 到的知识，而是编写 query，使用 RAG 工具的能力。如果想要直接学习知识的话，可以构建其他的任务，比如 QA

Oct 02 '25 07:10 ultmaster

@

首先你需要弄清楚 letta 能不能支持接入一个自定义的自己部署的 vllm。RL 的核心是要搞到 prompt-response pairs，response 指的是 LLM 生成的 tokens。如果是自己部署的 vllm，就都还有希望，兜底你可以 hack vllm 把 token 拿出来。如果不是自己部署的，那就要看 letta 有没有提供在线的可以追踪 token 记录的服务了。

看样子如果我想强化学习letta的agent，就需要把letta的agent按上面的格式进行一下重构，首先模型文件用vllm部署成llm服务，提供endpoint，然后单独提供mcp_servers，是吗？

是这样子。核心还是在于能不能用自己的 vllm。可以加首页 discord 群。

@ultmaster 非常感谢您的耐心讲解，我会入群并调研一下letta是否支持自定义llm。另外一个问题，关于后训练的意义：是否可以理解为，比如RAG后训练前向量数据库中有一条数据是问题111，回答222，后训练后产生的新模型部署成llm服务，可以在不依赖问答检索(向量数据库)的情况下，问他111，他自己自动也回复222了？也就是模型训练充分、收敛效果好，那么可以直接回答问题，跳过RAG阶段，如果模型训练不足、效果比较差，这时就还需要RAG来增强上下文

RAG后训练中我们对这两个部分感兴趣：

针对给定的query， LLM学习如何做决策去搜索（怎么搜索和要不要搜索）
针对搜索到的结果， LLM学习如何从中间去提取重要的信息（一部分现在reranking/chunking的部分可以offload给LLM）

您提到的关于LLM学习，我认为预训练会更有效果、当然这个领域也存在很多不同的讨论。

Oct 04 '25 04:10 syeehyn

@ultmaster 明确一下，rag_agent.py 中 llm.model 换成HuggingFace模型文件目录，base_url 换成agent里面的llm服务endpoint，对吧？

train.sh 这里是不是就不需要配置vllm了？它不要额外启动llm服务了

Oct 10 '25 06:10 LY-today

我印象中 verl 的 vllm 是关不掉的。不用 vllm 也得用 sglang。你这个 case 如果要用第三方的 llm 的话可能不能用 verl

Oct 10 '25 07:10 ultmaster

@ultmaster 那除了verl ，Lightning 还支持别的吗？尤其是我这种case？

Oct 10 '25 08:10 LY-today

感觉不大行。就像之前讨论的，这里的核心问题是在使用了第三方服务提供的 agent 和 llm，拿不到 trace 做训练。如果你有什么比较好的方案的话，欢迎分享。

Oct 10 '25 09:10 ultmaster

@ultmaster 我拿到了trace，应该如何转成可以强化学习的训练集？

Oct 11 '25 07:10 LY-today

@ultmaster 我拿到了trace，应该如何转成可以强化学习的训练集？

@ultmaster 如下格式：

{
  "data": {
    "start_time": "string",
    "end_time": "string",
    "hours": "integer",
    "jobs": [
      {
        "id": "string",
        "user_id": "string",
        "status": "string",
        "created_at": "string",
        "completed_at": "string",
        "metadata": {
          "job_type": "string",
          "agent_id": "string",
          "background": "integer"
        },
        "updated_at": "string",
        "is_deleted": "integer",
        "job_type": "string",
        "request_config": {
          "use_assistant_message": "integer",
          "assistant_message_tool_name": "string",
          "assistant_message_tool_kwarg": "string",
          "include_return_message_types": "null"
        },
        "ttft_ns": "integer",
        "total_duration_ns": "integer"
      }
    ],
    "steps": [
      {
        "id": "string",
        "organization_id": "string",
        "job_id": "string",
        "agent_id": "string",
        "provider_name": "string",
        "provider_category": "string",
        "model": "string",
        "model_endpoint": "string",
        "context_window_limit": "integer",
        "completion_tokens": "integer",
        "prompt_tokens": "integer",
        "total_tokens": "integer",
        "stop_reason": "string",
        "tags": [],
        "trace_id": "string",
        "error_data": "null",
        "status": "string",
        "created_at": "string",
        "updated_at": "string",
        "is_deleted": "integer"
      }
    ],
    "messages": [
      {
        "id": "string",
        "role": "string",
        "content": [
          {
            "type": "string",
            "text": "string"
          }
        ],
        "model": "string",
        "tool_calls": [
          {
            "id": "string",
            "function": {
              "arguments": "string",
              "name": "string"
            },
            "type": "string"
          }
        ],
        "tool_call_id": "string",
        "step_id": "string",
        "tool_returns": [],
        "sequence_id": "integer",
        "agent_id": "string",
        "organization_id": "string",
        "created_at": "string",
        "updated_at": "string",
        "is_deleted": "integer"
      }
    ],
    "summary": {
      "job_count": "integer",
      "step_count": "integer",
      "message_count": "integer"
    }
  },
  "end_time": "string",
  "start_time": "string",
  "time_range": "string"
}

Oct 11 '25 09:10 LY-today

@ultmaster 这样的trace可以被强化学习所使用吗？

Oct 13 '25 06:10 LY-today

For your first question, I think you can find a bunch of references online. For example, this one: volcengine/verl#1438

For the second question, I'm not familiar with letta's agent service. If the letta's agent service supports dumping traces, it's possible that we convert the traces into trainable data. Our team have been investigating into a similar case on Azure-hosted agents, and will provide a sample code in future.

@ultmaster 你好。基于外部trace进行agent强化学习的案例，目前推进到什么阶段了？

Oct 27 '25 09:10 LY-today

Please see this document on how to return custom traces from agents: https://microsoft.github.io/agent-lightning/stable/tutorials/write-agents/#return-values-from-agents

Oct 27 '25 09:10 ultmaster

@ultmaster 是说，用@rollout来获取类似letta agent的traces链路？然后traces用于输入给agent-lightning做agent的强化学习？

Oct 27 '25 11:10 LY-today

我可以建一个非官方的微信或者小红书群吗？discordy用不太习惯

Oct 28 '25 09:10 XianglongTan