agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

Empty triplets when using `with_structured_output` in a langchain agent

Open xavier-owkin opened this issue 1 month ago • 5 comments

When trying to train a langgraph agent with GRPO, I observe the following warnings when the agent uses the with_structured_output function of langchain

(TaskRunner pid=2743926) Warning: Reward is None for rollout ro-f38e23575c3a, will be auto-set to 0.0.
(TaskRunner pid=2743926) Warning: Length of triplets is 0 for rollout ro-f38e23575c3a.

This behavior does not appear when I am using the base llm without the with_structured_output method.

When inspecting the spans, I observed that when I used with_structured_outputs there is no span with name "openai.chat.completion" (while it's the case without with_structured_output) and therefore no response_token_ids are saved. Instead I see traces with name chat_model.llm like that

Span(
        rollout_id="ro-f38e23575c3a",
        attempt_id="at-23e12663",
        sequence_id=2,
        trace_id="b513a90f765ffab2e70515045376b7ab",
        span_id="57767100b5bb63dd",
        parent_id="6f47a29b226edb52",
        name="chat_model.llm",
        status=TraceStatus(status_code="UNSET", description=None),
        attributes={
            "gen_ai.request.model": "Qwen/Qwen3-8B",
            "langchain.llm.model": "Qwen/Qwen3-8B",
            "gen_ai.prompt": "[]",
            "langchain.llm.name": ["langchain", "chat_models", "openai", "ChatOpenAI"],
            "langchain.chat_message.roles": "[]",
            "langchain.chat_model.type": "chat",
            "gen_ai.request.temperature": 1.0,
            "agentops.span.kind": "llm",
            "agentops.operation.name": "chat_model",
            "gen_ai.completion": '["{\\n\\n\\n  \\"field_1\\": \\"value_1\\",\\n  \\"field_2\\": \\"value2\\",\\n  \\"field_3\\": \\"value_3\\\n\\n  \\n}"]',
            "gen_ai.usage.completion_tokens": 276,
            "gen_ai.usage.prompt_tokens": 389,
            "gen_ai.usage.total_tokens": 665,
        },
        events=[],
        ...
}

Here is the kind of span I observe when I don't use with_structured_outputs

Span(
        rollout_id="ro-f2da02b42afd",
        attempt_id="at-35588969",
        sequence_id=2,
        trace_id="e82fe2df35e304516c82855f718f0813",
        span_id="385252948874f324",
        parent_id="5bf0191efebccdbb",
        name="openai.chat.completion",
        status=TraceStatus(status_code="OK", description=None),
        attributes={
            "gen_ai.request.type": "chat",
            "gen_ai.system": "OpenAI",
            "gen_ai.request.model": "Qwen/Qwen3-8B",
            "gen_ai.request.temperature": 1.0,
            "gen_ai.request.streaming": False,
            "gen_ai.request.headers": "{'X-Stainless-Raw-Response': 'true'}",
            "gen_ai.prompt.0.role": "user",
            "gen_ai.prompt.0.content": 'LLM prompt',
            "gen_ai.response.id": "chatcmpl-edf09afd36744c66babdc9c1c244639c",
            "gen_ai.response.model": "hosted_vllm/Qwen/Qwen3-8B",
            "gen_ai.usage.total_tokens": 2317,
            "gen_ai.usage.prompt_tokens": 392,
            "gen_ai.usage.completion_tokens": 1925,
            "gen_ai.completion.0.finish_reason": "stop",
            "gen_ai.completion.0.role": "assistant",
            "gen_ai.completion.0.content": "<think>\nOkay, output of the LLM",
            "prompt_token_ids": [
                151644,
                872,
                271,

                198,
            ],
            "response_token_ids": [
                151667,
                198,
                32313,
                11,
                358,
 
            ],
        },
        events=[],
    ...
}

I precise that using the langchain_callback_handler does not change anything.

Is this an issue with angent-lightning or rather with agentops? Do you have a workaround?

xavier-owkin avatar Nov 13 '25 15:11 xavier-owkin

I think the problem is related to vllm. If you are using vllm >= 0.10.2, please verify whether return_token_ids + structured output (called guided decoding) works vllm's native chat completion server. If it's not working, please send an issue to vllm team and keep me looped in.

Reference: https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#extra-parameters_1

Image

ultmaster avatar Nov 14 '25 05:11 ultmaster

@ultmaster I am able to run the following code with vLLM 0.10.2

completion = client.chat.completions.create(
    model="Qwen/Qwen3-8B",
    messages=[
        {
            "role": "user",
            "content": "user question",
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {"name": "my_schema", "schema": pydantic_base_model.model_json_schema()},
    },
    extra_body={
        "return_token_ids": "True",
    },
)

And the answer from the vLLM server contains a field token_ids with the response token ids.

However I don't know how the langchain call to llm.with_structured_output(pydantic_model).invoke will be transferred to vLLM.

By the way, I just realised that in the list of spans there is one named "raw_gen_ai_request" before the "openai.chat.completion" one which has an attribute named "llm.hosted_vllm.choices" which contains the following

[
    {
        "finish_reason": "stop",
        "index": 0,
        "logprobs": None,
        "message": {
            "content": "llm output",
            "refusal": None,
            "role": "assistant",
            "annotations": None,
            "audio": None,
            "function_call": None,
            "tool_calls": [],
            "reasoning_content": None,
        },
        "stop_reason": None,
        "token_ids": [
            151667,
            198,
            32313,
            11,
            358,
            1184,
  
        ],
    }
]

This span is present with and without the use of structured outputs in langchain. The token_ids are the same than the one present in the field response_token_ids. Then I have the impression that the response token ids are indeed captured but not formatted as it was the case without with_structured_outputs.

xavier-owkin avatar Nov 14 '25 10:11 xavier-owkin

It's possible that structured output didn't use the chat completion, thus won't be captured at client side. But since the proxy server side is universal, the raw_gen_ai_request always exist.

In that case, you can use LlmProxyTraceToTriplet adapter and try again. To debug further, you need to look deeper into langchain implementation and agentops tracer, and see why the function fails to be completely traced.

ultmaster avatar Nov 14 '25 10:11 ultmaster

Can the LlmProxyTraceToTriplet be directly used without modifying agent-lightning code to make training work? Could you provide such an example?

xavier-owkin avatar Nov 14 '25 11:11 xavier-owkin

Yes. You can grep the example folder for examples.

ultmaster avatar Nov 14 '25 16:11 ultmaster