agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

LLMProxy doesn't support stream

Open ultmaster opened this issue 2 months ago • 1 comments

We can't get token ids from LLM Proxy when stream is enabled.

vLLM has token_ids returned in streaming. The problem is with LiteLLM and has three parts.

  1. .venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py has different implementations for __next__ and __anext__. I think __anext__ forgets to call success logging callback for each chunk. It only calls once for whole response.
  2. Still in this file, the chunk_creator directly tosses away token_ids from the raw chunk, causing the important token_ids being missing.
  3. There is no handler of stream_event in .venv/lib/python3.12/site-packages/litellm/integrations/opentelemetry.py. Thus we don't receive anything in the store.

A systematic bug fix for this issue is complex. A simple solution might be to turn off stream via some guardrail middleware, and fake a stream chunk when the non-streaming response is ready.

ultmaster avatar Nov 03 '25 07:11 ultmaster

partially resolved by #293.

ultmaster avatar Nov 29 '25 14:11 ultmaster