Turn generate_readme_index into a dockerized executable
When running test cases for causify-ai/csfy#124 I ran into the error:
__________________________________________________________________________________ ERROR collecting dev_scripts_helpers/documentation/test/test_generate_readme_index.py ___________________________________________________________________________________
ImportError while importing test module '/app/dev_scripts_helpers/documentation/test/test_generate_readme_index.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/venv/lib/python3.12/site-packages/_pytest/python.py:493: in importtestmodule
mod = import_path(
/venv/lib/python3.12/site-packages/_pytest/pathlib.py:587: in import_path
importlib.import_module(module_name)
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1387: in _gcd_import
???
<frozen importlib._bootstrap>:1360: in _find_and_load
???
<frozen importlib._bootstrap>:1331: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:935: in _load_unlocked
???
/venv/lib/python3.12/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
exec(co, module.__dict__)
dev_scripts_helpers/documentation/test/test_generate_readme_index.py:4: in <module>
import dev_scripts_helpers.documentation.generate_readme_index as dshdgrein
dev_scripts_helpers/documentation/generate_readme_index.py:60: in <module>
import openai
E ModuleNotFoundError: No module named 'openai'
================================================================================================================= short test summary info ==================================================================================================================
ERROR dev_scripts_helpers/documentation/test/test_generate_readme_index.py
Since test_generate_readme_index.py calls generate_readme_index.py, which imports hopenai.py and openai.
First approach was to implement the following try-except-finally into hopenai.py. But I run_fast_tests still produces the same error.
# Taken from helpers/hchatgpt.py
try:
import openai
except ImportError:
os.system("pip install openai")
finally:
import openai
This is because openai is not installed inside the docker container.
FYI @sonniki @gpsaggese
For now let's just skip executing unit tests with pytest.importorskip()
There are various solutions:
- The try-install approach is not great since every time somebody imports that module (even pytest when discovering tests!) a package gets installed
- a runnable dir for all the openai tests (FYI @heanhsok)
- to avoid getting charged by OpenAI for running our tests, we need to have an infra to mock / store / replay interactions with OpenAI (this is a nice Bounty)
Understood, I will avoid the try-install approach.
The code currently has a pseudo mock which accepts a placeholder argument to skip openai API calls. However, since hopenai.py is always imported, openai will always be imported as well. So using pytest.importorskip() will skip all the tests.
Instead, is using real mocking a good idea?
# Create mock openai module
mock_openai = types.ModuleType("openai")
# Create mock hopenai module
mock_hopenai = types.ModuleType("hopenai")
mock_hopenai.get_completion = lambda user_prompt, model=None: "This should never be called in placeholder mode."
# Inject both into sys.modules before importing the target
sys.modules["openai"] = mock_openai
sys.modules["hopenai"] = mock_hopenai
The code also has a basic store/replay mechanism since it reads and writes from a README of file indexes. But I can also implement a more robust infra by writing interactions into a JSON file.
For now we'll do pytest.importorskip to make it easier. @gpsaggese do we want to eventually create a dockerized executable for openai? If so, we can use this issue to track it. If not, we'll close this issue.
- Ok to
importorskipfor now - It's not going to be difficult to use dockerized executables since it's not a "separate" process, unless we make it so.
We will use the runnable subdir for everything that requires
openai. - Let's skip testing for now. I have a bounty for building a scaffolding infra to test this LLM stuff https://docs.google.com/document/d/1p5b07_66F5itkakgO9aGGn5dg_IJE6Z86MGQAG9k_Eg/edit?tab=t.0#heading=h.1baotvgc5dn7
Understood, importorskip will be used for now.
Are Interns allowed to work on/contribute to building the infra?
@heanhsok
@aangelo9 we've decided to pause/redirect this one, so you can focus on other tasks for now
@Shaunak01 let's start with a clear plan of what to do. E.g., I'm not clear on exactly what we want to do.
IMO
- the documentation tool chain is mainly using dockerized executables
- we could move this part to the linter dir (which is already kind of a runnable dir) and add also LLMs dependencies. We are going to use LLMs in the linter
I would start with the runnable dir for Causal AutoML (the other bug) which is more important than this
I would start with the runnable dir for Causal AutoML (the other bug) which is more important than this
@gpsaggese Do you mean Dockerize causal_kg dependencies in a single container causify-ai/csfy#5771 - https://github.com/causify-ai/csfy/issues/5771 ?
Yes, I'm referring to that issue. This can be paused in the meantime
@gpsaggese What do we need to do with this issue now that research_amp/causal_kg is a runnable dir?
Now that the nomenclature and the code is stabilizing, on top of my head, generate_readme_index.py should become a dockerized executable, since it requires openai and it's not part of the rest.
I'll change title and pause this since low priority.