[bug] output from llama_index tracing doesn't propagate up to the agent
See screenshot. I think for streaming specifically
steps to reproduce:
- run the llama_index example app.
🤖
Hey @mikeldking, good to see you again! Let's take a look at what's happening here.
Based on the provided context, it seems like the issue you're experiencing might be due to the way streaming responses are handled in the OpenInferenceTraceCallbackHandler. In the on_event_end method, if the response is a streaming response, the event data is attached to the stream object, which is responsible for ending the span when the stream is finished. This is done to defer the addition of the output value attribute, which is only available when the stream is finished, and that can happen a lot later than when on_trace_end is called.
Here is the relevant code:
if (
not is_straggler
and _is_streaming_response(response := payload.get(EventPayload.RESPONSE))
and response.response_gen is not None
):
response.response_gen = _ResponseGen(response.response_gen, event_data)
event_data.is_dispatched = True
If the stream is not properly consumed or if there's an error in the stream, the span might not be ended correctly, and the output from llama_index tracing might not propagate up to the agent.
Please ensure that the stream is being consumed correctly and without errors. If the issue persists, it might be a bug in the OpenInferenceTraceCallbackHandler or in the way the llama_index is handling the streaming responses. You can find more details in the OpenInferenceTraceCallbackHandler code.
As for the structure of the llama_index tracing output, it is structured as a series of events, each with an associated span. Each span contains a set of attributes that provide detailed information about the event. The exact structure and content of the tracing output can vary depending on the event type and the specific details of the event. However, the general structure includes the SpanAttributes, EventId, EventData, and StatusCode.
The _ResponseGen function appears to be used as a wrapper for the response_gen attribute of a StreamingResponse object. When a streaming response is detected, _ResponseGen is used to wrap the response_gen of the response. This could be for the purpose of adding additional tracing or logging functionality to the response generation process.
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
This is because the agent has a separate code path for the streaming response gen that is not detected by our methods, and is very difficult to monkey patch.
Would this also be the root cause of the issue I see here?:
Where I can trace the entire set of calls end-to-end, but when using them as part of a ChatEngine subclassed from BaseChatEngine, I am getting split traces between the embedding/retrieval calls and the LLM call.
Would this also be the root cause of the issue I see here?:
Where I can trace the entire set of calls end-to-end, but when using them as part of a
ChatEnginesubclassed fromBaseChatEngine, I am getting split traces between the embedding/retrieval calls and the LLM call.
@edhenry I've never seen the tool you're using in the screenshot! Are you using OpenInference with jaeger?
I'd be surprised if these issues are related as this ticket is mainly around not being able to propagate the output of a streaming llm response to a root span that is an agent. However you never know! Could you share a bit of the code for your ChatEngine?
I am getting split traces
Split traces is usually a symptom of an issue with the callback manager in Llama-Index. We would love it if you could open a issue with further details on that. Thanks.
I can confirm it was related to how I was structuring some of my objects. Apologies for jumping in on this issue as it isn't related!
I've never seen the tool you're using in the screenshot! Are you using OpenInference with jaeger?
I am and it works quite well out of the box. 🙂
Punting as callbacks are being deprecated
