Allow to stream only chunks for a final response for assistants

Open manoelneto opened this issue 8 months ago • 0 comments

Is your feature request related to a problem? Please describe. When I'm using assitants that calls tools with streaming, I need to check it has reached final chunk before I yield back the chunk.

Describe the solution you'd like Add a only_stream_final_chunks option to the assistant API. Override the chat_with_llm to make sure we don't yield unecessary chunks (tool call chunks for example)

May 06 '25 00:05 manoelneto