Extend coverage in pytest to cover when we run through system
Problem
In some tests, a unit test calls (e.g., Test_run_dockerized_pandoc in dev_scripts_helpers/documentation/test/test_dockerized_pandoc.py) another Python script via subprocess or os.system. the normal pytest-cov (or coverage.py) won’t automatically track the coverage of the subprocess.
This is the situation we have when we call llm_transform.py which calls a Docker script inside.
By default, coverage is only measured in the main Python process.
Solution
Read the related docs
- ./docs/tools/all.code_coverage.how_to_guide.md
- ./docs/tools/unit_test/all.run_unit_tests.how_to_guide.md
- ./docs/tools/unit_test/all.write_unit_tests.how_to_guide.md
One can solve this by telling the subprocess to run under coverage too and then merge the coverage output
> coverage run --parallel-mode your_script.py arg1 arg2
--parallel-mode tells coverage to write separate data files for each subprocess (.coverage.* files) instead of overwriting.
> coverage combine
> coverage report
In this case we want to get the coverage of llm_transform.py and then dockerized_llm_transform.py
- Study the problem and propose a solution
- Build a prototype with any hack necessary to show that it works
- Verify that running coverage before / after, one sees the improvement
- Document the problem and the solution in markdown
- Propose a generalization of our testing framework to "automate" the hacky solution
- Run the entire regressions showing that there is an increase in coverage (e.g., https://app.codecov.io/gh/causify-ai/helpers/tree/HelpersTask487_Add_Git_actions_for_Codecov_3/dev_scripts_helpers%2Fdocumentation)
Problem
coverage run does not capture coverage for subprocess calls and Dockerized scripts in llm_transform.py.
Approaches Tested
-
Manual Coverage Wrapping
- Wrapped subprocess calls with
coverage run --parallel-modein tests - Merged
.coverage.*files viacoverage combine - Achieved improved coverage for
llm_transform.py
- Wrapped subprocess calls with
-
Docker Coverage Integration
- Mounted host coverage directory into containers
- Installed
coveragemodule inside the container - Coverage data from Dockerized scripts not yet captured
Proposed Solutions
- Shell Wrapper (
wrap_coverage.sh)
Automates insertion ofcoverage run --parallel-modebefore every Python subprocess invocation. Plan is to create a shell script which checks processes running and gets their pids and routes back those processes to a python script which wraps a cmd and runs a the coverage for the process. - Detection Utility (
detect_and_wrap.py)
Scans subprocess commands at runtime and transparently wraps them for coverage i.e identify tests where cli commands are used and parse usingshlexand wrap the command during tests. - Coverage Script edit
Saw this solution online which updates
.coveragercand enables multiprocessing. (will have to test to verify)
Results
- Subprocess calls via
hsystemnow report coverage. - Dockerized script coverage remains unresolved—next steps include instrumenting the container entrypoint or using remote coverage collection.
Coverage Reports
Initial Coverage Report
user_1001@a4e7e65b4859:/app$ coverage report
Name Stmts Miss Cover
-------------------------------------------------------------------------
__init__.py 0 0 100%
conftest.py 74 33 55%
dev_scripts_helpers/__init__.py 0 0 100%
dev_scripts_helpers/documentation/__init__.py 0 0 100%
dev_scripts_helpers/documentation/lint_notes.py 211 184 13%
dev_scripts_helpers/llms/__init__.py 0 0 100%
dev_scripts_helpers/llms/llm_prompts.py 362 303 16%
dev_scripts_helpers/llms/test/__init__.py 0 0 100%
dev_scripts_helpers/llms/test/test_llm_transform.py 67 23 66%
helpers/__init__.py 0 0 100%
helpers/hdbg.py 387 270 30%
helpers/hdocker.py 483 434 10%
helpers/henv.py 216 92 57%
helpers/hgit.py 543 462 15%
helpers/hintrospection.py 125 102 18%
helpers/hio.py 329 229 30%
helpers/hlogging.py 292 155 47%
helpers/hmarkdown.py 368 320 13%
helpers/hparser.py 202 168 17%
helpers/hprint.py 421 238 43%
helpers/hserver.py 422 231 45%
helpers/hsystem.py 424 299 29%
helpers/htimer.py 116 72 38%
helpers/hunit_test.py 781 603 23%
helpers/hversion.py 85 21 75%
helpers/hwall_clock_time.py 40 19 52%
helpers/hwarnings.py 26 4 85%
helpers/repo_config_utils.py 151 114 25%
-------------------------------------------------------------------------
TOTAL 6125 4376 29%
Final Coverage Report (After Combine)
user_1001@84dfb77cfea4:/app$ coverage report
Name Stmts Miss Cover
-------------------------------------------------------------------------
__init__.py 0 0 100%
conftest.py 74 33 55%
dev_scripts_helpers/__init__.py 0 0 100%
dev_scripts_helpers/documentation/__init__.py 0 0 100%
dev_scripts_helpers/documentation/lint_notes.py 211 184 13%
dev_scripts_helpers/llms/__init__.py 0 0 100%
dev_scripts_helpers/llms/llm_prompts.py 362 303 16%
dev_scripts_helpers/llms/llm_transform.py 116 52 55%
dev_scripts_helpers/llms/test/__init__.py 0 0 100%
dev_scripts_helpers/llms/test/test_llm_transform.py 67 23 66%
helpers/__init__.py 0 0 100%
helpers/hdbg.py 387 256 34%
helpers/hdocker.py 483 349 28%
helpers/henv.py 216 92 57%
helpers/hgit.py 543 442 19%
helpers/hintrospection.py 125 102 18%
helpers/hio.py 329 229 30%
helpers/hlogging.py 292 155 47%
helpers/hmarkdown.py 368 320 13%
helpers/hparser.py 202 124 39%
helpers/hprint.py 421 230 45%
helpers/hserver.py 422 231 45%
helpers/hsystem.py 424 298 30%
helpers/htimer.py 116 72 38%
helpers/hunit_test.py 781 603 23%
helpers/hversion.py 85 21 75%
helpers/hwall_clock_time.py 40 19 52%
helpers/hwarnings.py 26 4 85%
helpers/repo_config_utils.py 151 81 46%
-------------------------------------------------------------------------
TOTAL 6241 4223 32%
user_1001@84dfb77cfea4:/app$
For dockerized scripts I'm unable to get coverage as the container exits before coverage files are collected and I'm unsure of how to proceed for this. I tried including STOPSIGNAL SIGINT in docker file but did not work as intended. Currently trying to figure out how to get coverage for this script.
Interesting progress. My mental plan was simpler than this. E.g.
- create a function that transforms a command line to be run inside Docker to track coverage
- run the code inside Docker to create a file
- the file is saved through the bind mount
- the caller merges the coverage reports
My gut feeling is that you are trying to do things "too transparently", while it's better (at least for the prototype) to get hooks in the caller and the callee. We don't have to build a general solution, but one that works for our dev system.
I'll review the PR in details, but pls send me an email and we'll schedule a pair programming session. We can add also @heanhsok
I will revise my approach with this in mind and make it align better with the dev system.
@madhurlak0810 I've posted the notes from our link here https://docs.google.com/document/d/1xPgQ2tWXQuVWKkGVONjOGd5j14mXSmGeY_4d1_sGzAE/edit?tab=t.v5h95jvu3hwe#heading=h.j6kcs1n3f3uz
Let's do baby step, like an example of a totally hacked prototype to show that the approach is clear and it works.
@gpsaggese I leveraged pytest-cov’s subprocess hook (coverage.process_startup()) exactly as in the docs i.e creating a coverage.pth to track every process. By installing coverage in the Docker image and mounting the host’s coverage directory into /app (so that COVERAGE_PROCESS_START=/app/.coveragerc is inherited), the container writes its data back to the host.
After rerunning these are the results:
SKIPPED [1] dev_scripts_helpers/llms/test/test_llm_transform.py:125: Run manually since it needs OpenAI credentials
===================================================================================== 3 passed, 1 skipped in 9.71s ======================================================================================
(client_venv.helpers) maddev@pop-os:~/src/helpers1$ cp coverage_docker/.coverage.* .
(client_venv.helpers) maddev@pop-os:~/src/helpers1$ coverage combine
Combined data file .coverage.pop-os.1.XLbvpcGx
Combined data file .coverage.pop-os.1.XcYkSoPx
Skipping duplicate data .coverage.pop-os.1.XcjLpxSx
Combined data file .coverage.pop-os.118946.XaoYDNdx
Combined data file .coverage.pop-os.119110.XvogpGXx
Combined data file .coverage.pop-os.119456.XvMkUUyx
Combined data file .coverage.pop-os.119592.XohLGmex
(client_venv.helpers) maddev@pop-os:~/src/helpers1$ coverage report
Name Stmts Miss Branch BrPart Cover
----------------------------------------------------------------------------------------
/etc/python3.10/sitecustomize.py 5 1 0 0 80%
__init__.py 0 0 0 0 100%
conftest.py 74 33 16 6 52%
dev_scripts_helpers/__init__.py 0 0 0 0 100%
dev_scripts_helpers/documentation/__init__.py 0 0 0 0 100%
dev_scripts_helpers/documentation/lint_notes.py 214 164 76 4 19%
dev_scripts_helpers/llms/__init__.py 0 0 0 0 100%
dev_scripts_helpers/llms/dockerized_llm_transform.py 32 4 8 4 80%
dev_scripts_helpers/llms/llm_prompts.py 377 41 44 8 86%
dev_scripts_helpers/llms/llm_transform.py 135 42 36 14 64%
dev_scripts_helpers/llms/test/__init__.py 0 0 0 0 100%
dev_scripts_helpers/llms/test/test_llm_transform.py 68 13 6 1 78%
helpers/__init__.py 0 0 0 0 100%
helpers/hdbg.py 387 254 136 22 31%
helpers/hdocker.py 513 359 86 10 28%
helpers/henv.py 216 91 46 4 50%
helpers/hgit.py 543 442 118 2 16%
helpers/hintrospection.py 125 102 48 0 13%
helpers/hio.py 332 227 118 12 27%
helpers/hlogging.py 292 155 90 21 42%
helpers/hmarkdown.py 404 323 152 0 15%
helpers/hopenai.py 270 152 82 13 38%
helpers/hparser.py 202 117 60 4 39%
helpers/hprint.py 421 232 134 14 42%
helpers/hserver.py 424 227 134 11 39%
helpers/hsystem.py 436 318 130 10 23%
helpers/htimer.py 116 60 14 3 45%
helpers/hunit_test.py 781 566 190 20 25%
helpers/hversion.py 85 24 14 3 65%
helpers/hwall_clock_time.py 40 19 10 1 44%
helpers/hwarnings.py 26 4 4 1 83%
helpers/repo_config_utils.py 151 82 28 6 43%
----------------------------------------------------------------------------------------
TOTAL 6669 4052 1780 194 35%
with this it seems to be working as intended. So fix steps are:
- inject coverage hook onto pythonpath to trace coverage of subprocess
- change coveragerc file for subprocess support
- edit docker file to include coverage and coveragerc
- mount coverage directory to coverage_docker in local
- copy files
- coverage combine
- coverage report
Ok so what we have found out is that there is a native mechanism for multiple processes (which is our case for when we do a system call) https://coverage.readthedocs.io/en/7.8.0/subprocess.html
For the Docker case (when we spawn a different container inside Docker), the same approach can work, even if the processes are not really in the same OS space, but there is no difference.
In other words, there is a hook to tell processes to start the coverage without us doing much more than setting some env variables (which we can easily do through the Docker compose).
Is this all correct? What's the next step then, just implement the final solution or do we need more prototype?
I'll add you to a Docker meeting on Thur and after that we can review / PP.
Yes, this is what we have found so far and is all correct. The built-in 'subprocess' support will work even when we are spawning a second docker container. We just need to edit the image and mount certain volumes differently to make sure coverage-data shows up. I will start implementing the final solution. Steps:
- Detect
docker runinvocation - Inspect the base image’s Dockerfile
- Update the Dockerfile (or Docker Compose) to enable coverage
- Install Coverage
- Set
COVERAGE_PROCESS_START - Mount a shared volume for coverage data(
./coverage_docker) - Rebuild the image or redeploy via Docker Compose.
- Combine coverage reports
- After all containers finish, run:
cp coverage_docker/.coverage.* . coverage combine coverage report - This merges the
.coveragefiles from both the tester and callee containers and generates a unified report.
- After all containers finish, run:
What do you mean by 1. Detect docker run? As to 2. we generate that automatically.
If you have doubts a draft PR with a "prototype" is best, rather then implementing and then finding out that the solution is not the right one.
@heanhsok PTAL at this and review the code
- We can detect docker run by watching tmp.system_cmd.sh so whenever a docker run is invoked we can use that.
- The base image is generated and the dockerfile is saved in tmp.build_docker so I was thinking I take that image and build the coverage image seperately(with the initial setup) and generate those files. I'll draft a PR as a prototype and update.
I'm not sure this matches my mental model.
The design principle is that the caller knows everything; the caller just does what it's told to do.
- If the caller is running in coverage mode, then it
- sets the right env variables (which are always passed through the docker compose)
- installs the right packages (e.g., the coverage) when it's calling the Docker container
- Note that there are two different ways of invoking Docker in our system: dockerized executables and runnable dirs. You should read the doc / white paper. So we need to make sure the solution works for both approaches.
- To make things simple I would have the coverage package always installed in both.
If the caller needs to run the coverage and the package is not present, then it dies.
Does it make sense?
Since this is getting a bit complicated, let's:
- write a design doc that explains everything we have discussed (the logic, the design principles, etc) in
./docs/tools/all.code_coverage.reference.md - And yes let's keep doing a MVP of the system, before making the code perfect, since we are still exploring / designing.
In any case, you are going in the right direction. This is a bit tricky.
- I will start writing the design doc that explains everything we have discussed.
- For the MVP I just want to clarify some things:
Just to clarify: if caller is active, we should ensure every Docker image installs the coverage package and adjusts its entrypoint (
entrypoint.sh) so that any command run inside whether it’s a “dockerized executable” or a “runnable dir” is traced by coverage and save coverage files? And if coverage is requested but not present in that image, it should immediately exit with an error or install coverage and hook?
- For the MVP I just want to clarify some things: Just to clarify: if caller is active, we should ensure every Docker image installs the coverage package and adjusts its entrypoint (
entrypoint.sh) so that any command run inside whether it’s a “dockerized executable” or a “runnable dir” is traced by coverage and save coverage files?
To simplify I would have the coverage package always installed in both cases, in this way you don't have to do weird gymnastics (attach the coverage on the fly, etc.). When the "caller is active", there are two options IIUC
- the caller changes the command to prepend the wrapper that does the coverage; or
- the caller sets the env vars hook for the coverage and nothing else is changed
The option 2) seems a bit more elegant, but it's also less explicit.
And if coverage is requested but not present in that image, it should immediately exit with an error or install coverage and hook?
If there is no package, there is an error (the caller doesn't even try to detect the issue).
Systems where parts of the systems try to infer the intention of other parts are impossible to maintain. The logic is that every piece of the system just does what it's told to do.
@gpsaggese @heanhsok I have made a draft PR with the respective design doc and the MVP where its currently working for docker run commands running in any script and reporting coverage as intended.
The PR is https://github.com/causify-ai/helpers/pull/754
The next step is to do a PP on Thur to look at the code in action and then merge and deploy
Remaining problems:
- Once the hook is injected, all the processes collect and generate coverage
- We don't have a solution yet
- We could use an invoke target that runs and disables coverage hooks when we run coverage
- How to make sure that
coveragepackage is available when needed- Let's make it an invariant: it should be in the template
devops/docker_build/pyproject.toml
- Let's make it an invariant: it should be in the template
Next steps for @Shaunak01 @heanhsok enable the coverage through the new system and see what happens
- [ ] Check that things that should be covered are covered