langchainrb
langchainrb copied to clipboard
Allow to stream only chunks for a final response for assistants
Is your feature request related to a problem? Please describe. When I'm using assitants that calls tools with streaming, I need to check it has reached final chunk before I yield back the chunk.
Describe the solution you'd like
Add a only_stream_final_chunks option to the assistant API.
Override the chat_with_llm to make sure we don't yield unecessary chunks (tool call chunks for example)