agentops icon indicating copy to clipboard operation
agentops copied to clipboard

[Bug]: LiteLLM Instrumentation not working for Responses API

Open areibman opened this issue 7 months ago β€’ 2 comments

Contact Details

No response

πŸ“¦ Package Version

0.4.14

🎞️ Framework Version

1.72.0

πŸ”Ž Describe the Bug

Neither normal instrumentation with LiteLLM or using the success_callback arg are working properly for LiteLLM for litellm.response (they work for litellm.completions). Success callback not working (see docs https://docs.litellm.ai/docs/observability/agentops_integration):

In [1]: import agentops
i
In [2]: import litellm

In [3]: litellm.success_callback = ["agentops"]

In [4]: agentops.init(auto_start_session=False)
   ...:

In [5]:
   ...:
   ...: # Non-streaming response
   ...: response = litellm.responses(
   ...:     model="openai/gpt-4o",
   ...:     input="Tell me a three sentence bedtime story about a unicorn.",
   ...:     max_output_tokens=100
   ...: )
   ...:
   ...: print(response)

-> No trace detected in AgentOps dashboard

As well as normal instrumentation:

In [3]: agentops.init()
πŸ–‡ AgentOps: Session Replay for default trace: https://app.agentops.ai/sessions?trace_id=24108c993e0f3b8b4d20b3e0b7e96c06
Out[3]: <agentops.legacy.Session at 0x1233f8cb0>

In [4]: import litellm

In [5]:
   ...: # Non-streaming response
   ...: response = litellm.responses(
   ...:     model="openai/gpt-4o",
   ...:     input="Tell me a three sentence bedtime story about a unicorn.",
   ...:     max_output_tokens=100
   ...: )
   ...:
   ...: print(response)

TODO:

  1. Fix instrumentation
  2. Update AgentOps docs if required
  3. Raise a PR and update LiteLLM's AgentOps docs if required

🀝 Contribution

  • [ ] Yes, I'd be happy to submit a pull request with these changes.
  • [ ] I need some guidance on how to contribute.
  • [ ] I'd prefer the AgentOps team to handle this update.

areibman avatar Jun 01 '25 20:06 areibman

The root cause is that the OpenAI Responses API uses different token field names (input_tokens, output_tokens) compared to the Chat Completion API (prompt_tokens, completion_tokens). Since we’re utilizing LiteLLM's OpenTelemetry span attribute extraction logic, the best approach would be to integrate LiteLLM properly.

Due to this, our exporter throws the following error: (DEBUG) πŸ–‡ AgentOps: [opentelemetry.exporter.otlp.proto.http.trace_exporter] Transient error Service Unavailable encountered while exporting span batch, retrying in 1s.

fenilfaldu avatar Jun 10 '25 22:06 fenilfaldu

Might also help to make a separate span attribute that states whether the type is "completion" or a "response"

areibman avatar Jun 18 '25 16:06 areibman