openinference [bug] output from llama_index tracing doesn't propagate up to the agent

See screenshot. I think for streaming specifically Screenshot 2024-01-26 at 12 44 08 PM

steps to reproduce:

run the llama_index example app.

Jan 26 '24 19:01 mikeldking

🤖

Hey @mikeldking, good to see you again! Let's take a look at what's happening here.

Based on the provided context, it seems like the issue you're experiencing might be due to the way streaming responses are handled in the OpenInferenceTraceCallbackHandler. In the on_event_end method, if the response is a streaming response, the event data is attached to the stream object, which is responsible for ending the span when the stream is finished. This is done to defer the addition of the output value attribute, which is only available when the stream is finished, and that can happen a lot later than when on_trace_end is called.

Here is the relevant code:

if (
    not is_straggler
    and _is_streaming_response(response := payload.get(EventPayload.RESPONSE))
    and response.response_gen is not None
):
    response.response_gen = _ResponseGen(response.response_gen, event_data)
    event_data.is_dispatched = True

If the stream is not properly consumed or if there's an error in the stream, the span might not be ended correctly, and the output from llama_index tracing might not propagate up to the agent.

Please ensure that the stream is being consumed correctly and without errors. If the issue persists, it might be a bug in the OpenInferenceTraceCallbackHandler or in the way the llama_index is handling the streaming responses. You can find more details in the OpenInferenceTraceCallbackHandler code.

As for the structure of the llama_index tracing output, it is structured as a series of events, each with an associated span. Each span contains a set of attributes that provide detailed information about the event. The exact structure and content of the tracing output can vary depending on the event type and the specific details of the event. However, the general structure includes the SpanAttributes, EventId, EventData, and StatusCode.

The _ResponseGen function appears to be used as a wrapper for the response_gen attribute of a StreamingResponse object. When a streaming response is detected, _ResponseGen is used to wrap the response_gen of the response. This could be for the purpose of adding additional tracing or logging functionality to the response generation process.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

python/instrumentation/openinference-instrumentation-llama-index/src/openinference/instrumentation/llama_index/_callback.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Jan 26 '24 19:01 dosubot[bot]

This is because the agent has a separate code path for the streaming response gen that is not detected by our methods, and is very difficult to monkey patch.

Feb 05 '24 20:02 RogerHYang

Would this also be the root cause of the issue I see here?:

Where I can trace the entire set of calls end-to-end, but when using them as part of a ChatEngine subclassed from BaseChatEngine, I am getting split traces between the embedding/retrieval calls and the LLM call.

Feb 06 '24 22:02 edhenry

Would this also be the root cause of the issue I see here?:

Where I can trace the entire set of calls end-to-end, but when using them as part of a ChatEngine subclassed from BaseChatEngine, I am getting split traces between the embedding/retrieval calls and the LLM call.

@edhenry I've never seen the tool you're using in the screenshot! Are you using OpenInference with jaeger?

I'd be surprised if these issues are related as this ticket is mainly around not being able to propagate the output of a streaming llm response to a root span that is an agent. However you never know! Could you share a bit of the code for your ChatEngine?

Feb 06 '24 23:02 mikeldking

I am getting split traces

Split traces is usually a symptom of an issue with the callback manager in Llama-Index. We would love it if you could open a issue with further details on that. Thanks.

Feb 09 '24 18:02 RogerHYang

I can confirm it was related to how I was structuring some of my objects. Apologies for jumping in on this issue as it isn't related!

I've never seen the tool you're using in the screenshot! Are you using OpenInference with jaeger?

I am and it works quite well out of the box. 🙂

Feb 09 '24 21:02 edhenry

Punting as callbacks are being deprecated

Mar 08 '24 18:03 mikeldking