fix(litellm): enable recording of streaming and async responses
Pull Request Description
Title: Fix: Enable automatic recording for LiteLLM streaming and async calls
Description
This PR addresses Issue #91 where litellm streaming responses (using stream=True or acompletion) were not being recorded in the Memori database.
Previously, the integration relied on a synchronous success_callback function. LiteLLM does not trigger synchronous callbacks for asynchronous or streaming operations.
Changes
-
Refactored
litellm_integration.py:- Replaced the standalone callback function with a
MemoriLoggerclass inheriting fromlitellm.integrations.custom_logger.CustomLogger. - Implemented
async_log_success_eventto handle async/streaming completion events. - LiteLLM automatically aggregates streaming chunks into the
response_objpassed to this logger, allowing us to record the full conversation without manual chunk accumulation.
- Replaced the standalone callback function with a
-
Added Verification Test:
- Created
tests/litellm_support/test_streaming_simple.py. - This test uses
acompletionwithstream=True. - It utilizes LiteLLM's
mock_responsefeature to verify the recording logic without incurring API costs or hitting rate limits.
- Created
How Has This Been Tested?
I ran the new test suite tests/litellm_support/litellm_test_streaming.py.
- Scenario: Simulated a
gpt-4ostreaming response. - Result: Verified that the conversation was successfully written to the SQLite database and that the metadata correctly identified the call as a stream.
Checklist
- [ ] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have added tests that prove my fix is effective
- [ ] New and existing unit tests pass locally with my changes -> not done yet
PR Documentation Suggestions
| Category | Suggestion | Impact |
| Api change |
Document new async/streaming support in LiteLLMUpdate the LiteLLM integration documentation to include information about the new File:
Suggestion importance[1-10]: 9__ Why: This suggestion updates existing documentation to include critical information about new async and streaming support in the LiteLLM integration, which is a significant user-facing change. It accurately reflects the code changes and provides valuable context for users leveraging the new | High |
| User guide |
Add LiteLLM streaming test documentationAdd a section to the testing documentation explaining the new test for LiteLLM File:
Suggestion importance[1-10]: 7__ Why: The suggestion adds valuable information to the testing documentation by explaining a new test for LiteLLM streaming support. It guides users on how to run the test, which verifies important functionality related to async streaming responses, enhancing the overall testing documentation. However, it does not update existing content, so it receives a slightly lower score. | Medium |
Any Thoughts on My New Idea?
I'm experimenting with automatically updating documentation as code changes. I won't post this anymore on this project without permission. I'm looking for some honest feedback about if this is useful or not. I know the logic could give better suggestions, but I want to check if I'm barking up the right tree.
+1 - tried to use Memori, and because we are using LiteLLM Streaming, I'm not seeing anything working, this seems to be a step in the right direction, would love to this this feature.