helpers icon indicating copy to clipboard operation
helpers copied to clipboard

Capture / replay interactions with OpenAI / LangChain

Open gpsaggese opened this issue 7 months ago • 8 comments

Specs

Problem Statement

  • Our software uses OpenAI's API through get_completion() in helpers/hopenai.py for LLM operations
  • Running tests for it would incur OpenAI API charges due to frequent API calls (hundreds of times daily from CI and developers)
  • Test results would lack determinism because OpenAI responses may vary for the same input
  • We need a way to "freeze" expected outputs for reliable testing

Current Implementation

  • No tests implemented for hopenai.get_completion() due to the reasons described above
  • Caching through functools.lru_cache for hopenai.get_completion()
  • Cost tracking exists but no mechanism to avoid costs during testing

Proposed Implementation

The proposed implementation should reuse ./helpers/hcache_simple.py

Phase 1: Basic Capture and Replay Functionality

  1. Create a capture mechanism to record OpenAI API interactions
  2. Implement a replay mechanism to use recorded responses instead of live API calls
  3. Add configuration options to control capture/replay behavior

Sample cache file

{
  "version": "1.0",
  "metadata": {
    "created_at": "2023-06-15T10:30:45Z",
    "last_updated": "2023-06-15T14:22:10Z"
  },
  "entries": {
    "cache_key_1": {
      "request": {
        "user_prompt": "What is machine learning?",
        "system_prompt": "You are a helpful AI assistant.",
        "model": "gpt-4o-mini",
        "temperature": 0.1,
        "additional_params": {
          "max_tokens": 500,
          "top_p": 1.0
        }
      },
      "response": {
        "content": "Machine learning is a subset of artificial intelligence...",
        "usage": {
          "prompt_tokens": 25,
          "completion_tokens": 150,
          "total_tokens": 175
        },
        "cost": 0.00123,
        "timestamp": "2023-06-15T10:30:45Z"
      }
    },

1 Version: Tracks the cache format version for future compatibility 2 Metadata: Contains information about the cache file itself - created_at: When the cache file was first created - last_updated: When the cache was last modified 3 Entries: A dictionary of cached interactions, where: - Each key is a unique identifier generated from the request parameters - Each value contains: - Request: Complete information about the original request - user_prompt: The user's input - system_prompt: The system instructions - model: The OpenAI model used - temperature: The temperature setting - additional_params: Any other parameters passed to the API - Response: Complete information about the API response - content: The actual text response from the model - usage: Token usage information - cost: The calculated cost of the API call - timestamp: When the response was received

Cache Key Generation

The cache key should be a deterministic hash of the request parameters to ensure consistent retrieval. The key generation should:

1 Include all parameters that affect the response (user_prompt, system_prompt, model, temperature, etc.) 2 Normalize inputs (e.g., trim whitespace, standardize formatting) 3 Use a consistent hashing algorithm (e.g., SHA-256) 4 Be collision-resistant for different inputs

Phase 2: Integration with Testing Framework

  1. Integrate with our testing framework to automatically use replay mode during tests
  2. Add utilities to manage and update the cache files
  3. Add tests for hopenai.get_completion() with replay mode

Phase 3: Advanced Features

  1. Implement intelligent cache invalidation
  2. Create tools for cache inspection

Acceptance Criteria

  • Zero OpenAI API calls during test runs when using replay mode
  • Identical test results across multiple runs when using the same cache
  • Simple interface to enable/disable capture and replay modes
  • Ability to update cached responses when needed
  • Minimal changes required to existing code that uses hopenai.get_completion()
  • Comprehensive documentation for developers

Architecture

Classes and Components

  1. OpenAICache

    • Responsible for storing and retrieving cached responses
    • Methods: save_to_cache(), load_from_cache(), has_cached_response()
    • Uses JSON files for storage
  2. CacheKey Generator

    • Creates unique keys based on input parameters
    • Handles normalization of inputs for consistent key generation
  3. Supported cache modes

    • DISABLED: No caching
    • CAPTURE: Make API calls and save responses to cache
    • REPLAY: Use cached responses, fail if not in cache
    • FALLBACK: Use cached responses if available, otherwise make API call
  4. Enhanced hopenai.get_completion() Function

    • Modified to check cache before making API calls
    • Saves responses to cache when in capture mode
    • Returns cached responses when in replay mode

Interactions

  • Application code calls hopenai.get_completion() as before
  • hopenai.get_completion() checks cache mode and acts accordingly
  • In capture mode, responses are saved to the cache file
  • In replay mode, responses are loaded from the cache file
  • Cache files are stored in a configurable location

Test Plan

  1. Unit Tests

    • Test cache key generation for consistency
    • Test cache storage and retrieval
  2. Integration Tests

    • Test different cache modes making sure not to make live API calls in tests
  3. End-to-End Tests

    • Create and run end-to-end tests with replay mode to verify determinism

Documentation

  1. Developer Guide

    • How to enable/disable caching
    • How to manage cache files
    • Best practices for testing with cached responses
    • Information about the overhead of caching mechanism and response time comparison between live API and cached responses
  2. Code Documentation

    • Clear docstrings for all new functions and classes
    • Comments explaining the caching logic
  3. Example Usage

    • Sample code showing how to use different cache modes
    • Examples of updating cache files
  4. Architecture Documentation

    • Diagram showing the flow of data through the caching system
    • Explanation of design decisions

gpsaggese avatar May 09 '25 23:05 gpsaggese

@srinivassaitangudu I suggest doing a proposal PR since this can be implemented in many ways, and some implementations are more efficient than others.

E.g., you don't have to need LangChain anymore since we started using Openrouter, which allows to connect to all models using the OpenAI interface.

gpsaggese avatar May 09 '25 23:05 gpsaggese

@gpsaggese Seems like there is no "openai" library in docker environment.

mypy-extensions           1.0.0
nbclient                  0.10.2
nbconvert                 7.16.6
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.4.2
notebook                  7.3.2
notebook_shim             0.2.4
numpy                     2.2.3
overrides                 7.7.0
packaging                 24.2
pandas                    2.2.3
pandocfilters             1.5.1
parso                     0.8.4
pathspec                  0.12.1
patsy                     1.0.1
pexpect                   4.9.0
pillow                    11.1.0

I have raised PR yesterday, showed the same error in checks as well. Please have a look at PR, if it seems okay, ill work on documentation and few betterments.

srinivassaitangudu avatar May 13 '25 22:05 srinivassaitangudu

  1. Yes in the dev container there is no openai package. @Shaunak01 has an issue to create a runnable dir. For now you need to install a package inside Docker bash with sudo /bin/bash -c "(source /venv/bin/activate; pip install ...

  2. Make sure to go through all the instructions carefully https://github.com/causify-ai/tutorials/issues/415

  3. Remember the rule of make things simple for the reader (e.g., the link of the PR is https://github.com/causify-ai/helpers/pull/709)

gpsaggese avatar May 15 '25 14:05 gpsaggese

I ran my code in the Docker container successfully, and similarly, I attempted to run the test cases using the "pytest" command in the Docker bash. However, it’s throwing the following error:

Docker bash

ImportError while loading conftest '/app/conftest.py'.
conftest.py:6: in <module>
    import helpers.hdbg as dbg
E   ModuleNotFoundError: No module named 'helpers'

When I ran the same tests locally, they executed successfully. I searched through the issues but couldn’t find any relevant solution.

srinivassaitangudu avatar May 15 '25 18:05 srinivassaitangudu

  1. I assume you were able to run the test cases i run_fast_tests before any of your changes, according to https://github.com/causify-ai/tutorials/issues/415, correct?

  2. Then you tried to do i docker_bash, ran pytest inside the docker helper container and you got that error, correct?

  3. Not sure what you mean "run the test locally". Did you mean outside the container, adding the needed dependency inside a virtual env?

  4. Can you copy all the log in a file and attach to this issue?

  5. Note that openai package is not supported yet in Docker container. @Shaunak01 is adding some support for it, so you need to add the package before running the tests in the container

gpsaggese avatar May 15 '25 19:05 gpsaggese

  1. Correct
  2. Correct
  3. Yes, i mean outside the container after adding in virtual env.
  4. This is the entire log.
ImportError while loading conftest '/app/conftest.py'.
conftest.py:6: in <module>
   import helpers.hdbg as dbg
E   ModuleNotFoundError: No module named 'helpers'
  1. I tried the same and encountered the above error.

srinivassaitangudu avatar May 15 '25 19:05 srinivassaitangudu

Ok so focusing on 4, what I mean is the output of docker_bash to see if there is any warning.

Usually these problems can be solved more easily screen sharing, but let's give it a last shot before jumping on a call.

  1. If you do bash -c 'invoke docker_cmd -c "pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache"' 2>&1 | tee log_fail.txt

You get a failure like

==================================== ERRORS ====================================
________________ ERROR collecting helpers/test/test_hopenai.py _________________
ImportError while importing test module '/app/helpers/test/test_hopenai.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/venv/lib/python3.12/site-packages/_pytest/python.py:493: in importtestmodule
    mod = import_path(
/venv/lib/python3.12/site-packages/_pytest/pathlib.py:587: in import_path
    importlib.import_module(module_name)
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1387: in _gcd_import
    ???
<frozen importlib._bootstrap>:1360: in _find_and_load
    ???
<frozen importlib._bootstrap>:1331: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:935: in _load_unlocked
    ???
/venv/lib/python3.12/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
    exec(co, module.__dict__)
helpers/test/test_hopenai.py:7: in <module>
    import helpers.hopenai as hopenai
helpers/hopenai.py:16: in <module>
    import openai
E   ModuleNotFoundError: No module named 'openai'

which is what I expect

  1. If you do

bash -c 'invoke docker_cmd -c "source script.sh"' 2>&1 | tee log_success.txt

> cat script.sh
sudo /bin/bash -c '(source /venv/bin/activate; pip install openai)'
pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache

16:15:14 - INFO hdbg.py init_logger:1018 > cmd='/venv/bin/pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache' collected 2 items

helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 (0.00 s) PASSED [ 50%] helpers/test/test_hopenai.py::Test_load_response_from_cache::test2 (0.00 s) PASSED [100%]

============================= slowest 3 durations ============================== 0.00s call helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 0.00s setup helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 0.00s setup helpers/test/test_hopenai.py::Test_load_response_from_cache::test2 ============================== 2 passed in 1.95s ===============================


I've attached the two logs.

[log_fail.txt](https://github.com/user-attachments/files/20233983/log_fail.txt)
[log_success.txt](https://github.com/user-attachments/files/20233984/log_success.txt)

Can you look at the logs and spot a problem with what you are doing?
You can also run those commands and send me the logs

If you can't make progress, we'll jump on a call and fix it together

gpsaggese avatar May 15 '25 20:05 gpsaggese

Sorry for wasting your time. It’s working fine now, but I’m not sure why the issue occurred earlier.

srinivassaitangudu avatar May 15 '25 21:05 srinivassaitangudu