helpers Turn generate_readme_index into a dockerized executable

When running test cases for causify-ai/csfy#124 I ran into the error:

__________________________________________________________________________________ ERROR collecting dev_scripts_helpers/documentation/test/test_generate_readme_index.py ___________________________________________________________________________________
ImportError while importing test module '/app/dev_scripts_helpers/documentation/test/test_generate_readme_index.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/venv/lib/python3.12/site-packages/_pytest/python.py:493: in importtestmodule
    mod = import_path(
/venv/lib/python3.12/site-packages/_pytest/pathlib.py:587: in import_path
    importlib.import_module(module_name)
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1387: in _gcd_import
    ???
<frozen importlib._bootstrap>:1360: in _find_and_load
    ???
<frozen importlib._bootstrap>:1331: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:935: in _load_unlocked
    ???
/venv/lib/python3.12/site-packages/_pytest/assertion/rewrite.py:184: in exec_module
    exec(co, module.__dict__)
dev_scripts_helpers/documentation/test/test_generate_readme_index.py:4: in <module>
    import dev_scripts_helpers.documentation.generate_readme_index as dshdgrein
dev_scripts_helpers/documentation/generate_readme_index.py:60: in <module>
    import openai
E   ModuleNotFoundError: No module named 'openai'
================================================================================================================= short test summary info ==================================================================================================================
ERROR dev_scripts_helpers/documentation/test/test_generate_readme_index.py

Since test_generate_readme_index.py calls generate_readme_index.py, which imports hopenai.py and openai. First approach was to implement the following try-except-finally into hopenai.py. But I run_fast_tests still produces the same error.

# Taken from helpers/hchatgpt.py
try:
    import openai
except ImportError:
    os.system("pip install openai")
finally:
    import openai

This is because openai is not installed inside the docker container.

FYI @sonniki @gpsaggese

Apr 11 '25 16:04 aangelo9

For now let's just skip executing unit tests with pytest.importorskip()

There are various solutions:

The try-install approach is not great since every time somebody imports that module (even pytest when discovering tests!) a package gets installed
a runnable dir for all the openai tests (FYI @heanhsok)
to avoid getting charged by OpenAI for running our tests, we need to have an infra to mock / store / replay interactions with OpenAI (this is a nice Bounty)

Apr 13 '25 17:04 gpsaggese

Understood, I will avoid the try-install approach.

The code currently has a pseudo mock which accepts a placeholder argument to skip openai API calls. However, since hopenai.py is always imported, openai will always be imported as well. So using pytest.importorskip() will skip all the tests.

Instead, is using real mocking a good idea?

# Create mock openai module
mock_openai = types.ModuleType("openai")

# Create mock hopenai module
mock_hopenai = types.ModuleType("hopenai")
mock_hopenai.get_completion = lambda user_prompt, model=None: "This should never be called in placeholder mode."

# Inject both into sys.modules before importing the target
sys.modules["openai"] = mock_openai
sys.modules["hopenai"] = mock_hopenai

The code also has a basic store/replay mechanism since it reads and writes from a README of file indexes. But I can also implement a more robust infra by writing interactions into a JSON file.

Apr 14 '25 06:04 aangelo9

For now we'll do pytest.importorskip to make it easier. @gpsaggese do we want to eventually create a dockerized executable for openai? If so, we can use this issue to track it. If not, we'll close this issue.

Apr 14 '25 14:04 sonniki

Ok to importorskip for now
It's not going to be difficult to use dockerized executables since it's not a "separate" process, unless we make it so. We will use the runnable subdir for everything that requires openai.
Let's skip testing for now. I have a bounty for building a scaffolding infra to test this LLM stuff https://docs.google.com/document/d/1p5b07_66F5itkakgO9aGGn5dg_IJE6Z86MGQAG9k_Eg/edit?tab=t.0#heading=h.1baotvgc5dn7

Apr 14 '25 22:04 gpsaggese

Understood, importorskip will be used for now.

Are Interns allowed to work on/contribute to building the infra?

Apr 14 '25 23:04 aangelo9

@heanhsok

Apr 16 '25 13:04 gpsaggese

@aangelo9 we've decided to pause/redirect this one, so you can focus on other tasks for now

Apr 16 '25 13:04 sonniki

@Shaunak01 let's start with a clear plan of what to do. E.g., I'm not clear on exactly what we want to do.

IMO

the documentation tool chain is mainly using dockerized executables
we could move this part to the linter dir (which is already kind of a runnable dir) and add also LLMs dependencies. We are going to use LLMs in the linter

I would start with the runnable dir for Causal AutoML (the other bug) which is more important than this

May 02 '25 14:05 gpsaggese

I would start with the runnable dir for Causal AutoML (the other bug) which is more important than this

@gpsaggese Do you mean Dockerize causal_kg dependencies in a single container causify-ai/csfy#5771 - https://github.com/causify-ai/csfy/issues/5771 ?

May 05 '25 13:05 Shaunak01

Yes, I'm referring to that issue. This can be paused in the meantime

May 07 '25 02:05 gpsaggese

@gpsaggese What do we need to do with this issue now that research_amp/causal_kg is a runnable dir?

May 28 '25 18:05 Shaunak01

Now that the nomenclature and the code is stabilizing, on top of my head, generate_readme_index.py should become a dockerized executable, since it requires openai and it's not part of the rest.

I'll change title and pause this since low priority.

Jun 02 '25 12:06 gpsaggese