Memori icon indicating copy to clipboard operation
Memori copied to clipboard

fix(litellm): enable recording of streaming and async responses

Open andypalmi opened this issue 1 month ago • 2 comments

Pull Request Description

Title: Fix: Enable automatic recording for LiteLLM streaming and async calls

Description This PR addresses Issue #91 where litellm streaming responses (using stream=True or acompletion) were not being recorded in the Memori database.

Previously, the integration relied on a synchronous success_callback function. LiteLLM does not trigger synchronous callbacks for asynchronous or streaming operations.

Changes

  1. Refactored litellm_integration.py:

    • Replaced the standalone callback function with a MemoriLogger class inheriting from litellm.integrations.custom_logger.CustomLogger.
    • Implemented async_log_success_event to handle async/streaming completion events.
    • LiteLLM automatically aggregates streaming chunks into the response_obj passed to this logger, allowing us to record the full conversation without manual chunk accumulation.
  2. Added Verification Test:

    • Created tests/litellm_support/test_streaming_simple.py.
    • This test uses acompletion with stream=True.
    • It utilizes LiteLLM's mock_response feature to verify the recording logic without incurring API costs or hitting rate limits.

How Has This Been Tested? I ran the new test suite tests/litellm_support/litellm_test_streaming.py.

  • Scenario: Simulated a gpt-4o streaming response.
  • Result: Verified that the conversation was successfully written to the SQLite database and that the metadata correctly identified the call as a stream.

Checklist

  • [ ] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have added tests that prove my fix is effective
  • [ ] New and existing unit tests pass locally with my changes -> not done yet

andypalmi avatar Nov 22 '25 03:11 andypalmi

PR Documentation Suggestions

CategorySuggestion                                                                                                                                    Impact
Api change
Document new async/streaming support in LiteLLM

Update the LiteLLM integration documentation to include information about the new
MemoriLogger class and its support for async and streaming responses.

File: docs/integrations/litellm_integration.md

--- a/docs/integrations/litellm_integration.md
+++ b/docs/integrations/litellm_integration.md
@@ -1,11 +1,15 @@
 ### LiteLLM Integration

 

-The LiteLLM integration allows for automatic recording of conversations into Memori.

+The LiteLLM integration allows for automatic recording of conversations into Memori. With the introduction of the `MemoriLogger` class, the integration now supports both synchronous and asynchronous/streaming responses.

 

 **Usage:**

 

 ═══ python ═══

 from memori import Memori

 memori = Memori(...)

 memori.enable()  # Automatically registers LiteLLM callbacks

 ═══

+

+**New Features:**

+- `MemoriLogger` class: Inherits from `CustomLogger` to handle both sync and async/streaming events.

+- Automatic aggregation of streaming chunks into a single response object for recording.

Actions: Accept | Reject

Suggestion importance[1-10]: 9

__

Why: This suggestion updates existing documentation to include critical information about new async and streaming support in the LiteLLM integration, which is a significant user-facing change. It accurately reflects the code changes and provides valuable context for users leveraging the new MemoriLogger class.

High
User guide
Add LiteLLM streaming test documentation

Add a section to the testing documentation explaining the new test for LiteLLM
streaming support.

File: docs/testing.md

--- a/docs/testing.md
+++ b/docs/testing.md
@@ -1,3 +1,15 @@
 ### Testing

 

 This section covers how to run tests for the Memori project.

+

+**LiteLLM Streaming Test:**

+

+A new test `tests/litellm_support/litellm_test_streaming.py` has been added to verify the recording logic for LiteLLM's async streaming responses. This test uses the `acompletion` function with `stream=True` and utilizes LiteLLM's `mock_response` feature to simulate responses without API costs.

+

+To run the test:

+

+```bash

+python tests/litellm_support/litellm_test_streaming.py

+```

+

+This test ensures that conversations are correctly recorded in the database and that the metadata identifies the call as a stream.

Actions: Accept | Reject

Suggestion importance[1-10]: 7

__

Why: The suggestion adds valuable information to the testing documentation by explaining a new test for LiteLLM streaming support. It guides users on how to run the test, which verifies important functionality related to async streaming responses, enhancing the overall testing documentation. However, it does not update existing content, so it receives a slightly lower score.

Medium

Any Thoughts on My New Idea?

I'm experimenting with automatically updating documentation as code changes. I won't post this anymore on this project without permission. I'm looking for some honest feedback about if this is useful or not. I know the logic could give better suggestions, but I want to check if I'm barking up the right tree.

hayden3456 avatar Nov 26 '25 03:11 hayden3456

+1 - tried to use Memori, and because we are using LiteLLM Streaming, I'm not seeing anything working, this seems to be a step in the right direction, would love to this this feature.

genesis-gh-agorshkov avatar Nov 26 '25 23:11 genesis-gh-agorshkov