helpers Capture / replay interactions with OpenAI / LangChain

Specs

Problem Statement

Our software uses OpenAI's API through get_completion() in helpers/hopenai.py for LLM operations
Running tests for it would incur OpenAI API charges due to frequent API calls (hundreds of times daily from CI and developers)
Test results would lack determinism because OpenAI responses may vary for the same input
We need a way to "freeze" expected outputs for reliable testing

Current Implementation

No tests implemented for hopenai.get_completion() due to the reasons described above
Caching through functools.lru_cache for hopenai.get_completion()
Cost tracking exists but no mechanism to avoid costs during testing

Proposed Implementation

The proposed implementation should reuse ./helpers/hcache_simple.py

Phase 1: Basic Capture and Replay Functionality

Create a capture mechanism to record OpenAI API interactions
Implement a replay mechanism to use recorded responses instead of live API calls
Add configuration options to control capture/replay behavior

Sample cache file

{
  "version": "1.0",
  "metadata": {
    "created_at": "2023-06-15T10:30:45Z",
    "last_updated": "2023-06-15T14:22:10Z"
  },
  "entries": {
    "cache_key_1": {
      "request": {
        "user_prompt": "What is machine learning?",
        "system_prompt": "You are a helpful AI assistant.",
        "model": "gpt-4o-mini",
        "temperature": 0.1,
        "additional_params": {
          "max_tokens": 500,
          "top_p": 1.0
        }
      },
      "response": {
        "content": "Machine learning is a subset of artificial intelligence...",
        "usage": {
          "prompt_tokens": 25,
          "completion_tokens": 150,
          "total_tokens": 175
        },
        "cost": 0.00123,
        "timestamp": "2023-06-15T10:30:45Z"
      }
    },

1 Version: Tracks the cache format version for future compatibility 2 Metadata: Contains information about the cache file itself - created_at: When the cache file was first created - last_updated: When the cache was last modified 3 Entries: A dictionary of cached interactions, where: - Each key is a unique identifier generated from the request parameters - Each value contains: - Request: Complete information about the original request - user_prompt: The user's input - system_prompt: The system instructions - model: The OpenAI model used - temperature: The temperature setting - additional_params: Any other parameters passed to the API - Response: Complete information about the API response - content: The actual text response from the model - usage: Token usage information - cost: The calculated cost of the API call - timestamp: When the response was received

Cache Key Generation

The cache key should be a deterministic hash of the request parameters to ensure consistent retrieval. The key generation should:

1 Include all parameters that affect the response (user_prompt, system_prompt, model, temperature, etc.) 2 Normalize inputs (e.g., trim whitespace, standardize formatting) 3 Use a consistent hashing algorithm (e.g., SHA-256) 4 Be collision-resistant for different inputs

Phase 2: Integration with Testing Framework

Integrate with our testing framework to automatically use replay mode during tests
Add utilities to manage and update the cache files
Add tests for hopenai.get_completion() with replay mode

Phase 3: Advanced Features

Implement intelligent cache invalidation
Create tools for cache inspection

Acceptance Criteria

Zero OpenAI API calls during test runs when using replay mode
Identical test results across multiple runs when using the same cache
Simple interface to enable/disable capture and replay modes
Ability to update cached responses when needed
Minimal changes required to existing code that uses hopenai.get_completion()
Comprehensive documentation for developers

Architecture

Classes and Components

OpenAICache
- Responsible for storing and retrieving cached responses
- Methods: save_to_cache(), load_from_cache(), has_cached_response()
- Uses JSON files for storage
CacheKey Generator
- Creates unique keys based on input parameters
- Handles normalization of inputs for consistent key generation
Supported cache modes
- DISABLED: No caching
- CAPTURE: Make API calls and save responses to cache
- REPLAY: Use cached responses, fail if not in cache
- FALLBACK: Use cached responses if available, otherwise make API call
Enhanced hopenai.get_completion() Function
- Modified to check cache before making API calls
- Saves responses to cache when in capture mode
- Returns cached responses when in replay mode

Interactions

Application code calls hopenai.get_completion() as before
hopenai.get_completion() checks cache mode and acts accordingly
In capture mode, responses are saved to the cache file
In replay mode, responses are loaded from the cache file
Cache files are stored in a configurable location

Test Plan

Unit Tests
- Test cache key generation for consistency
- Test cache storage and retrieval
Integration Tests
- Test different cache modes making sure not to make live API calls in tests
End-to-End Tests
- Create and run end-to-end tests with replay mode to verify determinism

Documentation

Developer Guide
- How to enable/disable caching
- How to manage cache files
- Best practices for testing with cached responses
- Information about the overhead of caching mechanism and response time comparison between live API and cached responses
Code Documentation
- Clear docstrings for all new functions and classes
- Comments explaining the caching logic
Example Usage
- Sample code showing how to use different cache modes
- Examples of updating cache files
Architecture Documentation
- Diagram showing the flow of data through the caching system
- Explanation of design decisions

May 09 '25 23:05 gpsaggese

@srinivassaitangudu I suggest doing a proposal PR since this can be implemented in many ways, and some implementations are more efficient than others.

E.g., you don't have to need LangChain anymore since we started using Openrouter, which allows to connect to all models using the OpenAI interface.

May 09 '25 23:05 gpsaggese

@gpsaggese Seems like there is no "openai" library in docker environment.

mypy-extensions           1.0.0
nbclient                  0.10.2
nbconvert                 7.16.6
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.4.2
notebook                  7.3.2
notebook_shim             0.2.4
numpy                     2.2.3
overrides                 7.7.0
packaging                 24.2
pandas                    2.2.3
pandocfilters             1.5.1
parso                     0.8.4
pathspec                  0.12.1
patsy                     1.0.1
pexpect                   4.9.0
pillow                    11.1.0

I have raised PR yesterday, showed the same error in checks as well. Please have a look at PR, if it seems okay, ill work on documentation and few betterments.

May 13 '25 22:05 srinivassaitangudu

Yes in the dev container there is no openai package. @Shaunak01 has an issue to create a runnable dir. For now you need to install a package inside Docker bash with sudo /bin/bash -c "(source /venv/bin/activate; pip install ...
Make sure to go through all the instructions carefully https://github.com/causify-ai/tutorials/issues/415
Remember the rule of make things simple for the reader (e.g., the link of the PR is https://github.com/causify-ai/helpers/pull/709)

May 15 '25 14:05 gpsaggese

I ran my code in the Docker container successfully, and similarly, I attempted to run the test cases using the "pytest" command in the Docker bash. However, it’s throwing the following error:

Docker bash

ImportError while loading conftest '/app/conftest.py'.
conftest.py:6: in <module>
    import helpers.hdbg as dbg
E   ModuleNotFoundError: No module named 'helpers'

When I ran the same tests locally, they executed successfully. I searched through the issues but couldn’t find any relevant solution.

May 15 '25 18:05 srinivassaitangudu

I assume you were able to run the test cases i run_fast_tests before any of your changes, according to https://github.com/causify-ai/tutorials/issues/415, correct?
Then you tried to do i docker_bash, ran pytest inside the docker helper container and you got that error, correct?
Not sure what you mean "run the test locally". Did you mean outside the container, adding the needed dependency inside a virtual env?
Can you copy all the log in a file and attach to this issue?
Note that openai package is not supported yet in Docker container. @Shaunak01 is adding some support for it, so you need to add the package before running the tests in the container

May 15 '25 19:05 gpsaggese

Correct
Correct
Yes, i mean outside the container after adding in virtual env.
This is the entire log.

ImportError while loading conftest '/app/conftest.py'.
conftest.py:6: in <module>
   import helpers.hdbg as dbg
E   ModuleNotFoundError: No module named 'helpers'

I tried the same and encountered the above error.

May 15 '25 19:05 srinivassaitangudu

Ok so focusing on 4, what I mean is the output of docker_bash to see if there is any warning.

Usually these problems can be solved more easily screen sharing, but let's give it a last shot before jumping on a call.

If you do bash -c 'invoke docker_cmd -c "pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache"' 2>&1 | tee log_fail.txt

You get a failure like

==================================== ERRORS ====================================
________________ ERROR collecting helpers/test/test_hopenai.py _________________
ImportError while importing test module '/app/helpers/test/test_hopenai.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/venv/lib/python3.12/site-packages/_pytest/python.py:493: in importtestmodule
    mod = import_path(
/venv/lib/python3.12/site-packages/_pytest/pathlib.py:587: in import_path
    importlib.import_module(module_name)
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1387: in _gcd_import
    ???
<frozen importlib._bootstrap>:1360: in _find_and_load
    ???
<frozen importlib._bootstrap>:1331: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:935: in _load_unlocked
    ???
/venv/lib/python3.12/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
    exec(co, module.__dict__)
helpers/test/test_hopenai.py:7: in <module>
    import helpers.hopenai as hopenai
helpers/hopenai.py:16: in <module>
    import openai
E   ModuleNotFoundError: No module named 'openai'

which is what I expect

If you do

bash -c 'invoke docker_cmd -c "source script.sh"' 2>&1 | tee log_success.txt

> cat script.sh
sudo /bin/bash -c '(source /venv/bin/activate; pip install openai)'
pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache

16:15:14 - INFO hdbg.py init_logger:1018 > cmd='/venv/bin/pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache' collected 2 items

helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 (0.00 s) PASSED [ 50%] helpers/test/test_hopenai.py::Test_load_response_from_cache::test2 (0.00 s) PASSED [100%]

============================= slowest 3 durations ============================== 0.00s call helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 0.00s setup helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 0.00s setup helpers/test/test_hopenai.py::Test_load_response_from_cache::test2 ============================== 2 passed in 1.95s ===============================


I've attached the two logs.

[log_fail.txt](https://github.com/user-attachments/files/20233983/log_fail.txt)
[log_success.txt](https://github.com/user-attachments/files/20233984/log_success.txt)

Can you look at the logs and spot a problem with what you are doing?
You can also run those commands and send me the logs

If you can't make progress, we'll jump on a call and fix it together

May 15 '25 20:05 gpsaggese

Sorry for wasting your time. It’s working fine now, but I’m not sure why the issue occurred earlier.

May 15 '25 21:05 srinivassaitangudu