Capture / replay interactions with OpenAI / LangChain
Specs
Problem Statement
- Our software uses OpenAI's API through
get_completion()inhelpers/hopenai.pyfor LLM operations - Running tests for it would incur OpenAI API charges due to frequent API calls (hundreds of times daily from CI and developers)
- Test results would lack determinism because OpenAI responses may vary for the same input
- We need a way to "freeze" expected outputs for reliable testing
Current Implementation
- No tests implemented for
hopenai.get_completion()due to the reasons described above - Caching through
functools.lru_cacheforhopenai.get_completion() - Cost tracking exists but no mechanism to avoid costs during testing
Proposed Implementation
The proposed implementation should reuse ./helpers/hcache_simple.py
Phase 1: Basic Capture and Replay Functionality
- Create a capture mechanism to record OpenAI API interactions
- Implement a replay mechanism to use recorded responses instead of live API calls
- Add configuration options to control capture/replay behavior
Sample cache file
{
"version": "1.0",
"metadata": {
"created_at": "2023-06-15T10:30:45Z",
"last_updated": "2023-06-15T14:22:10Z"
},
"entries": {
"cache_key_1": {
"request": {
"user_prompt": "What is machine learning?",
"system_prompt": "You are a helpful AI assistant.",
"model": "gpt-4o-mini",
"temperature": 0.1,
"additional_params": {
"max_tokens": 500,
"top_p": 1.0
}
},
"response": {
"content": "Machine learning is a subset of artificial intelligence...",
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
},
"cost": 0.00123,
"timestamp": "2023-06-15T10:30:45Z"
}
},
1 Version: Tracks the cache format version for future compatibility 2 Metadata: Contains information about the cache file itself - created_at: When the cache file was first created - last_updated: When the cache was last modified 3 Entries: A dictionary of cached interactions, where: - Each key is a unique identifier generated from the request parameters - Each value contains: - Request: Complete information about the original request - user_prompt: The user's input - system_prompt: The system instructions - model: The OpenAI model used - temperature: The temperature setting - additional_params: Any other parameters passed to the API - Response: Complete information about the API response - content: The actual text response from the model - usage: Token usage information - cost: The calculated cost of the API call - timestamp: When the response was received
Cache Key Generation
The cache key should be a deterministic hash of the request parameters to ensure consistent retrieval. The key generation should:
1 Include all parameters that affect the response (user_prompt, system_prompt, model, temperature, etc.) 2 Normalize inputs (e.g., trim whitespace, standardize formatting) 3 Use a consistent hashing algorithm (e.g., SHA-256) 4 Be collision-resistant for different inputs
Phase 2: Integration with Testing Framework
- Integrate with our testing framework to automatically use replay mode during tests
- Add utilities to manage and update the cache files
- Add tests for
hopenai.get_completion()with replay mode
Phase 3: Advanced Features
- Implement intelligent cache invalidation
- Create tools for cache inspection
Acceptance Criteria
- Zero OpenAI API calls during test runs when using replay mode
- Identical test results across multiple runs when using the same cache
- Simple interface to enable/disable capture and replay modes
- Ability to update cached responses when needed
- Minimal changes required to existing code that uses
hopenai.get_completion() - Comprehensive documentation for developers
Architecture
Classes and Components
-
OpenAICache
- Responsible for storing and retrieving cached responses
- Methods:
save_to_cache(),load_from_cache(),has_cached_response() - Uses JSON files for storage
-
CacheKey Generator
- Creates unique keys based on input parameters
- Handles normalization of inputs for consistent key generation
-
Supported cache modes
DISABLED: No cachingCAPTURE: Make API calls and save responses to cacheREPLAY: Use cached responses, fail if not in cacheFALLBACK: Use cached responses if available, otherwise make API call
-
Enhanced
hopenai.get_completion()Function- Modified to check cache before making API calls
- Saves responses to cache when in capture mode
- Returns cached responses when in replay mode
Interactions
- Application code calls
hopenai.get_completion()as before hopenai.get_completion()checks cache mode and acts accordingly- In capture mode, responses are saved to the cache file
- In replay mode, responses are loaded from the cache file
- Cache files are stored in a configurable location
Test Plan
-
Unit Tests
- Test cache key generation for consistency
- Test cache storage and retrieval
-
Integration Tests
- Test different cache modes making sure not to make live API calls in tests
-
End-to-End Tests
- Create and run end-to-end tests with replay mode to verify determinism
Documentation
-
Developer Guide
- How to enable/disable caching
- How to manage cache files
- Best practices for testing with cached responses
- Information about the overhead of caching mechanism and response time comparison between live API and cached responses
-
Code Documentation
- Clear docstrings for all new functions and classes
- Comments explaining the caching logic
-
Example Usage
- Sample code showing how to use different cache modes
- Examples of updating cache files
-
Architecture Documentation
- Diagram showing the flow of data through the caching system
- Explanation of design decisions
@srinivassaitangudu I suggest doing a proposal PR since this can be implemented in many ways, and some implementations are more efficient than others.
E.g., you don't have to need LangChain anymore since we started using Openrouter, which allows to connect to all models using the OpenAI interface.
@gpsaggese Seems like there is no "openai" library in docker environment.
mypy-extensions 1.0.0
nbclient 0.10.2
nbconvert 7.16.6
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.4.2
notebook 7.3.2
notebook_shim 0.2.4
numpy 2.2.3
overrides 7.7.0
packaging 24.2
pandas 2.2.3
pandocfilters 1.5.1
parso 0.8.4
pathspec 0.12.1
patsy 1.0.1
pexpect 4.9.0
pillow 11.1.0
I have raised PR yesterday, showed the same error in checks as well. Please have a look at PR, if it seems okay, ill work on documentation and few betterments.
-
Yes in the dev container there is no openai package. @Shaunak01 has an issue to create a runnable dir. For now you need to install a package inside Docker bash with
sudo /bin/bash -c "(source /venv/bin/activate; pip install ... -
Make sure to go through all the instructions carefully https://github.com/causify-ai/tutorials/issues/415
-
Remember the rule of make things simple for the reader (e.g., the link of the PR is https://github.com/causify-ai/helpers/pull/709)
I ran my code in the Docker container successfully, and similarly, I attempted to run the test cases using the "pytest" command in the Docker bash. However, it’s throwing the following error:
Docker bash
ImportError while loading conftest '/app/conftest.py'.
conftest.py:6: in <module>
import helpers.hdbg as dbg
E ModuleNotFoundError: No module named 'helpers'
When I ran the same tests locally, they executed successfully. I searched through the issues but couldn’t find any relevant solution.
-
I assume you were able to run the test cases
i run_fast_testsbefore any of your changes, according to https://github.com/causify-ai/tutorials/issues/415, correct? -
Then you tried to do
i docker_bash, ran pytest inside the docker helper container and you got that error, correct? -
Not sure what you mean "run the test locally". Did you mean outside the container, adding the needed dependency inside a virtual env?
-
Can you copy all the log in a file and attach to this issue?
-
Note that openai package is not supported yet in Docker container. @Shaunak01 is adding some support for it, so you need to add the package before running the tests in the container
- Correct
- Correct
- Yes, i mean outside the container after adding in virtual env.
- This is the entire log.
ImportError while loading conftest '/app/conftest.py'.
conftest.py:6: in <module>
import helpers.hdbg as dbg
E ModuleNotFoundError: No module named 'helpers'
- I tried the same and encountered the above error.
Ok so focusing on 4, what I mean is the output of docker_bash to see if there is any warning.
Usually these problems can be solved more easily screen sharing, but let's give it a last shot before jumping on a call.
- If you do bash -c 'invoke docker_cmd -c "pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache"' 2>&1 | tee log_fail.txt
You get a failure like
==================================== ERRORS ====================================
________________ ERROR collecting helpers/test/test_hopenai.py _________________
ImportError while importing test module '/app/helpers/test/test_hopenai.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/venv/lib/python3.12/site-packages/_pytest/python.py:493: in importtestmodule
mod = import_path(
/venv/lib/python3.12/site-packages/_pytest/pathlib.py:587: in import_path
importlib.import_module(module_name)
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1387: in _gcd_import
???
<frozen importlib._bootstrap>:1360: in _find_and_load
???
<frozen importlib._bootstrap>:1331: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:935: in _load_unlocked
???
/venv/lib/python3.12/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
exec(co, module.__dict__)
helpers/test/test_hopenai.py:7: in <module>
import helpers.hopenai as hopenai
helpers/hopenai.py:16: in <module>
import openai
E ModuleNotFoundError: No module named 'openai'
which is what I expect
- If you do
bash -c 'invoke docker_cmd -c "source script.sh"' 2>&1 | tee log_success.txt
> cat script.sh
sudo /bin/bash -c '(source /venv/bin/activate; pip install openai)'
pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache
16:15:14 - INFO hdbg.py init_logger:1018 > cmd='/venv/bin/pytest ./helpers/test/test_hopenai.py::Test_load_response_from_cache' collected 2 items
helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 (0.00 s) PASSED [ 50%] helpers/test/test_hopenai.py::Test_load_response_from_cache::test2 (0.00 s) PASSED [100%]
============================= slowest 3 durations ============================== 0.00s call helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 0.00s setup helpers/test/test_hopenai.py::Test_load_response_from_cache::test1 0.00s setup helpers/test/test_hopenai.py::Test_load_response_from_cache::test2 ============================== 2 passed in 1.95s ===============================
I've attached the two logs.
[log_fail.txt](https://github.com/user-attachments/files/20233983/log_fail.txt)
[log_success.txt](https://github.com/user-attachments/files/20233984/log_success.txt)
Can you look at the logs and spot a problem with what you are doing?
You can also run those commands and send me the logs
If you can't make progress, we'll jump on a call and fix it together
Sorry for wasting your time. It’s working fine now, but I’m not sure why the issue occurred earlier.