requestTrace in the cache of gateway plugin does not support stream request
🐛 Describe the bug
requestTrace currently only reports data when requests are not using steam. It should work in for both stream and non-stream vLLm requests.
Steps to Reproduce
No response
Expected behavior
No response
Environment
No response
For stream as well request trace is logged.
In stream scenario, token token is reported in second last stream. When HandleResponseBody gets the stream with total tokens set, it will log in request trace.
@zhangjyr Can you check the response, and if no actions is required then close the issue.
Right now request trace is added on EndOfStream, but for streaming, it needs to be added for n-1 stream chunk. cc https://github.com/vllm-project/aibrix/issues/790 - we can add support once we have separate feature flag for heterogeneous.
This is completed.