helpers icon indicating copy to clipboard operation
helpers copied to clipboard

Extend coverage in pytest to cover when we run through system

Open gpsaggese opened this issue 7 months ago • 6 comments

Problem
In some tests, a unit test calls (e.g., Test_run_dockerized_pandoc in dev_scripts_helpers/documentation/test/test_dockerized_pandoc.py) another Python script via subprocess or os.system. the normal pytest-cov (or coverage.py) won’t automatically track the coverage of the subprocess.
This is the situation we have when we call llm_transform.py which calls a Docker script inside.
By default, coverage is only measured in the main Python process.

Solution
Read the related docs

  • ./docs/tools/all.code_coverage.how_to_guide.md
  • ./docs/tools/unit_test/all.run_unit_tests.how_to_guide.md
  • ./docs/tools/unit_test/all.write_unit_tests.how_to_guide.md

One can solve this by telling the subprocess to run under coverage too and then merge the coverage output

> coverage run --parallel-mode your_script.py arg1 arg2
--parallel-mode tells coverage to write separate data files for each subprocess (.coverage.* files) instead of overwriting.
> coverage combine
> coverage report

In this case we want to get the coverage of llm_transform.py and then dockerized_llm_transform.py

  1. Study the problem and propose a solution
  2. Build a prototype with any hack necessary to show that it works
  3. Verify that running coverage before / after, one sees the improvement
  4. Document the problem and the solution in markdown
  5. Propose a generalization of our testing framework to "automate" the hacky solution
  6. Run the entire regressions showing that there is an increase in coverage (e.g., https://app.codecov.io/gh/causify-ai/helpers/tree/HelpersTask487_Add_Git_actions_for_Codecov_3/dev_scripts_helpers%2Fdocumentation)

gpsaggese avatar May 10 '25 03:05 gpsaggese

Problem

coverage run does not capture coverage for subprocess calls and Dockerized scripts in llm_transform.py.

Approaches Tested

  1. Manual Coverage Wrapping

    • Wrapped subprocess calls with coverage run --parallel-mode in tests
    • Merged .coverage.* files via coverage combine
    • Achieved improved coverage for llm_transform.py
  2. Docker Coverage Integration

    • Mounted host coverage directory into containers
    • Installed coverage module inside the container
    • Coverage data from Dockerized scripts not yet captured

Proposed Solutions

  • Shell Wrapper (wrap_coverage.sh)
    Automates insertion of coverage run --parallel-mode before every Python subprocess invocation. Plan is to create a shell script which checks processes running and gets their pids and routes back those processes to a python script which wraps a cmd and runs a the coverage for the process.
  • Detection Utility (detect_and_wrap.py)
    Scans subprocess commands at runtime and transparently wraps them for coverage i.e identify tests where cli commands are used and parse using shlex and wrap the command during tests.
  • Coverage Script edit Saw this solution online which updates .coveragerc and enables multiprocessing. (will have to test to verify)

Results

  • Subprocess calls via hsystem now report coverage.
  • Dockerized script coverage remains unresolved—next steps include instrumenting the container entrypoint or using remote coverage collection.

Coverage Reports

Initial Coverage Report

user_1001@a4e7e65b4859:/app$ coverage report
Name                                                Stmts   Miss  Cover
-------------------------------------------------------------------------
__init__.py                                             0      0   100%
conftest.py                                            74     33    55%
dev_scripts_helpers/__init__.py                         0      0   100%
dev_scripts_helpers/documentation/__init__.py           0      0   100%
dev_scripts_helpers/documentation/lint_notes.py       211    184    13%
dev_scripts_helpers/llms/__init__.py                    0      0   100%
dev_scripts_helpers/llms/llm_prompts.py               362    303    16%
dev_scripts_helpers/llms/test/__init__.py               0      0   100%
dev_scripts_helpers/llms/test/test_llm_transform.py    67     23    66%
helpers/__init__.py                                     0      0   100%
helpers/hdbg.py                                       387    270    30%
helpers/hdocker.py                                    483    434    10%
helpers/henv.py                                       216     92    57%
helpers/hgit.py                                       543    462    15%
helpers/hintrospection.py                             125    102    18%
helpers/hio.py                                        329    229    30%
helpers/hlogging.py                                   292    155    47%
helpers/hmarkdown.py                                  368    320    13%
helpers/hparser.py                                    202    168    17%
helpers/hprint.py                                     421    238    43%
helpers/hserver.py                                    422    231    45%
helpers/hsystem.py                                    424    299    29%
helpers/htimer.py                                     116     72    38%
helpers/hunit_test.py                                 781    603    23%
helpers/hversion.py                                    85     21    75%
helpers/hwall_clock_time.py                            40     19    52%
helpers/hwarnings.py                                   26      4    85%
helpers/repo_config_utils.py                          151    114    25%
-------------------------------------------------------------------------
TOTAL                                                6125   4376    29%

Final Coverage Report (After Combine)

user_1001@84dfb77cfea4:/app$ coverage report
Name                                                Stmts   Miss  Cover
-------------------------------------------------------------------------
__init__.py                                             0      0   100%
conftest.py                                            74     33    55%
dev_scripts_helpers/__init__.py                         0      0   100%
dev_scripts_helpers/documentation/__init__.py           0      0   100%
dev_scripts_helpers/documentation/lint_notes.py       211    184    13%
dev_scripts_helpers/llms/__init__.py                    0      0   100%
dev_scripts_helpers/llms/llm_prompts.py               362    303    16%
dev_scripts_helpers/llms/llm_transform.py             116     52    55%
dev_scripts_helpers/llms/test/__init__.py               0      0   100%
dev_scripts_helpers/llms/test/test_llm_transform.py    67     23    66%
helpers/__init__.py                                     0      0   100%
helpers/hdbg.py                                       387    256    34%
helpers/hdocker.py                                    483    349    28%
helpers/henv.py                                       216     92    57%
helpers/hgit.py                                       543    442    19%
helpers/hintrospection.py                             125    102    18%
helpers/hio.py                                        329    229    30%
helpers/hlogging.py                                   292    155    47%
helpers/hmarkdown.py                                  368    320    13%
helpers/hparser.py                                    202    124    39%
helpers/hprint.py                                     421    230    45%
helpers/hserver.py                                    422    231    45%
helpers/hsystem.py                                    424    298    30%
helpers/htimer.py                                     116     72    38%
helpers/hunit_test.py                                 781    603    23%
helpers/hversion.py                                    85     21    75%
helpers/hwall_clock_time.py                            40     19    52%
helpers/hwarnings.py                                   26      4    85%
helpers/repo_config_utils.py                          151     81    46%
-------------------------------------------------------------------------
TOTAL                                                6241   4223    32%
user_1001@84dfb77cfea4:/app$ 

madhurlak0810 avatar May 22 '25 00:05 madhurlak0810

For dockerized scripts I'm unable to get coverage as the container exits before coverage files are collected and I'm unsure of how to proceed for this. I tried including STOPSIGNAL SIGINT in docker file but did not work as intended. Currently trying to figure out how to get coverage for this script.

madhurlak0810 avatar May 23 '25 01:05 madhurlak0810

Interesting progress. My mental plan was simpler than this. E.g.

  • create a function that transforms a command line to be run inside Docker to track coverage
  • run the code inside Docker to create a file
  • the file is saved through the bind mount
  • the caller merges the coverage reports

My gut feeling is that you are trying to do things "too transparently", while it's better (at least for the prototype) to get hooks in the caller and the callee. We don't have to build a general solution, but one that works for our dev system.

I'll review the PR in details, but pls send me an email and we'll schedule a pair programming session. We can add also @heanhsok

gpsaggese avatar May 24 '25 21:05 gpsaggese

I will revise my approach with this in mind and make it align better with the dev system.

madhurlak0810 avatar May 24 '25 23:05 madhurlak0810

@madhurlak0810 I've posted the notes from our link here https://docs.google.com/document/d/1xPgQ2tWXQuVWKkGVONjOGd5j14mXSmGeY_4d1_sGzAE/edit?tab=t.v5h95jvu3hwe#heading=h.j6kcs1n3f3uz

Let's do baby step, like an example of a totally hacked prototype to show that the approach is clear and it works.

gpsaggese avatar May 28 '25 21:05 gpsaggese

@gpsaggese I leveraged pytest-cov’s subprocess hook (coverage.process_startup()) exactly as in the docs i.e creating a coverage.pth to track every process. By installing coverage in the Docker image and mounting the host’s coverage directory into /app (so that COVERAGE_PROCESS_START=/app/.coveragerc is inherited), the container writes its data back to the host.

After rerunning these are the results:

SKIPPED [1] dev_scripts_helpers/llms/test/test_llm_transform.py:125: Run manually since it needs OpenAI credentials
===================================================================================== 3 passed, 1 skipped in 9.71s ======================================================================================
(client_venv.helpers) maddev@pop-os:~/src/helpers1$ cp coverage_docker/.coverage.* .
(client_venv.helpers) maddev@pop-os:~/src/helpers1$ coverage combine
Combined data file .coverage.pop-os.1.XLbvpcGx
Combined data file .coverage.pop-os.1.XcYkSoPx
Skipping duplicate data .coverage.pop-os.1.XcjLpxSx
Combined data file .coverage.pop-os.118946.XaoYDNdx
Combined data file .coverage.pop-os.119110.XvogpGXx
Combined data file .coverage.pop-os.119456.XvMkUUyx
Combined data file .coverage.pop-os.119592.XohLGmex
(client_venv.helpers) maddev@pop-os:~/src/helpers1$ coverage report
Name                                                   Stmts   Miss Branch BrPart  Cover
----------------------------------------------------------------------------------------
/etc/python3.10/sitecustomize.py                           5      1      0      0    80%
__init__.py                                                0      0      0      0   100%
conftest.py                                               74     33     16      6    52%
dev_scripts_helpers/__init__.py                            0      0      0      0   100%
dev_scripts_helpers/documentation/__init__.py              0      0      0      0   100%
dev_scripts_helpers/documentation/lint_notes.py          214    164     76      4    19%
dev_scripts_helpers/llms/__init__.py                       0      0      0      0   100%
dev_scripts_helpers/llms/dockerized_llm_transform.py      32      4      8      4    80%
dev_scripts_helpers/llms/llm_prompts.py                  377     41     44      8    86%
dev_scripts_helpers/llms/llm_transform.py                135     42     36     14    64%
dev_scripts_helpers/llms/test/__init__.py                  0      0      0      0   100%
dev_scripts_helpers/llms/test/test_llm_transform.py       68     13      6      1    78%
helpers/__init__.py                                        0      0      0      0   100%
helpers/hdbg.py                                          387    254    136     22    31%
helpers/hdocker.py                                       513    359     86     10    28%
helpers/henv.py                                          216     91     46      4    50%
helpers/hgit.py                                          543    442    118      2    16%
helpers/hintrospection.py                                125    102     48      0    13%
helpers/hio.py                                           332    227    118     12    27%
helpers/hlogging.py                                      292    155     90     21    42%
helpers/hmarkdown.py                                     404    323    152      0    15%
helpers/hopenai.py                                       270    152     82     13    38%
helpers/hparser.py                                       202    117     60      4    39%
helpers/hprint.py                                        421    232    134     14    42%
helpers/hserver.py                                       424    227    134     11    39%
helpers/hsystem.py                                       436    318    130     10    23%
helpers/htimer.py                                        116     60     14      3    45%
helpers/hunit_test.py                                    781    566    190     20    25%
helpers/hversion.py                                       85     24     14      3    65%
helpers/hwall_clock_time.py                               40     19     10      1    44%
helpers/hwarnings.py                                      26      4      4      1    83%
helpers/repo_config_utils.py                             151     82     28      6    43%
----------------------------------------------------------------------------------------
TOTAL                                                   6669   4052   1780    194    35%

with this it seems to be working as intended. So fix steps are:

  1. inject coverage hook onto pythonpath to trace coverage of subprocess
  2. change coveragerc file for subprocess support
  3. edit docker file to include coverage and coveragerc
  4. mount coverage directory to coverage_docker in local
  5. copy files
  6. coverage combine
  7. coverage report

madhurlak0810 avatar May 30 '25 09:05 madhurlak0810

Ok so what we have found out is that there is a native mechanism for multiple processes (which is our case for when we do a system call) https://coverage.readthedocs.io/en/7.8.0/subprocess.html

For the Docker case (when we spawn a different container inside Docker), the same approach can work, even if the processes are not really in the same OS space, but there is no difference.

In other words, there is a hook to tell processes to start the coverage without us doing much more than setting some env variables (which we can easily do through the Docker compose).

Is this all correct? What's the next step then, just implement the final solution or do we need more prototype?

I'll add you to a Docker meeting on Thur and after that we can review / PP.

gpsaggese avatar Jun 02 '25 18:06 gpsaggese

Yes, this is what we have found so far and is all correct. The built-in 'subprocess' support will work even when we are spawning a second docker container. We just need to edit the image and mount certain volumes differently to make sure coverage-data shows up. I will start implementing the final solution. Steps:

  1. Detect docker run invocation
  2. Inspect the base image’s Dockerfile
  3. Update the Dockerfile (or Docker Compose) to enable coverage
    • Install Coverage
    • Set COVERAGE_PROCESS_START
    • Mount a shared volume for coverage data(./coverage_docker)
    • Rebuild the image or redeploy via Docker Compose.
  4. Combine coverage reports
    • After all containers finish, run:
      cp coverage_docker/.coverage.* .
      coverage combine
      coverage report
      
    • This merges the .coverage files from both the tester and callee containers and generates a unified report.

madhurlak0810 avatar Jun 03 '25 04:06 madhurlak0810

What do you mean by 1. Detect docker run? As to 2. we generate that automatically.

If you have doubts a draft PR with a "prototype" is best, rather then implementing and then finding out that the solution is not the right one.

@heanhsok PTAL at this and review the code

gpsaggese avatar Jun 03 '25 13:06 gpsaggese

  1. We can detect docker run by watching tmp.system_cmd.sh so whenever a docker run is invoked we can use that.
  2. The base image is generated and the dockerfile is saved in tmp.build_docker so I was thinking I take that image and build the coverage image seperately(with the initial setup) and generate those files. I'll draft a PR as a prototype and update.

madhurlak0810 avatar Jun 03 '25 14:06 madhurlak0810

I'm not sure this matches my mental model.

The design principle is that the caller knows everything; the caller just does what it's told to do.

  • If the caller is running in coverage mode, then it
    • sets the right env variables (which are always passed through the docker compose)
    • installs the right packages (e.g., the coverage) when it's calling the Docker container
      • Note that there are two different ways of invoking Docker in our system: dockerized executables and runnable dirs. You should read the doc / white paper. So we need to make sure the solution works for both approaches.
      • To make things simple I would have the coverage package always installed in both.

If the caller needs to run the coverage and the package is not present, then it dies.

Does it make sense?

Since this is getting a bit complicated, let's:

  1. write a design doc that explains everything we have discussed (the logic, the design principles, etc) in ./docs/tools/all.code_coverage.reference.md
  2. And yes let's keep doing a MVP of the system, before making the code perfect, since we are still exploring / designing.

In any case, you are going in the right direction. This is a bit tricky.

gpsaggese avatar Jun 03 '25 14:06 gpsaggese

  1. I will start writing the design doc that explains everything we have discussed.
  2. For the MVP I just want to clarify some things: Just to clarify: if caller is active, we should ensure every Docker image installs the coverage package and adjusts its entrypoint (entrypoint.sh) so that any command run inside whether it’s a “dockerized executable” or a “runnable dir” is traced by coverage and save coverage files? And if coverage is requested but not present in that image, it should immediately exit with an error or install coverage and hook?

madhurlak0810 avatar Jun 04 '25 06:06 madhurlak0810

  1. For the MVP I just want to clarify some things: Just to clarify: if caller is active, we should ensure every Docker image installs the coverage package and adjusts its entrypoint (entrypoint.sh) so that any command run inside whether it’s a “dockerized executable” or a “runnable dir” is traced by coverage and save coverage files?

To simplify I would have the coverage package always installed in both cases, in this way you don't have to do weird gymnastics (attach the coverage on the fly, etc.). When the "caller is active", there are two options IIUC

  1. the caller changes the command to prepend the wrapper that does the coverage; or
  2. the caller sets the env vars hook for the coverage and nothing else is changed

The option 2) seems a bit more elegant, but it's also less explicit.

And if coverage is requested but not present in that image, it should immediately exit with an error or install coverage and hook?

If there is no package, there is an error (the caller doesn't even try to detect the issue).

Systems where parts of the systems try to infer the intention of other parts are impossible to maintain. The logic is that every piece of the system just does what it's told to do.

gpsaggese avatar Jun 04 '25 16:06 gpsaggese

@gpsaggese @heanhsok I have made a draft PR with the respective design doc and the MVP where its currently working for docker run commands running in any script and reporting coverage as intended.

madhurlak0810 avatar Jun 06 '25 16:06 madhurlak0810

The PR is https://github.com/causify-ai/helpers/pull/754

The next step is to do a PP on Thur to look at the code in action and then merge and deploy

gpsaggese avatar Jun 16 '25 14:06 gpsaggese

Remaining problems:

  1. Once the hook is injected, all the processes collect and generate coverage
    • We don't have a solution yet
    • We could use an invoke target that runs and disables coverage hooks when we run coverage
  2. How to make sure that coverage package is available when needed
    • Let's make it an invariant: it should be in the template devops/docker_build/pyproject.toml

Next steps for @Shaunak01 @heanhsok enable the coverage through the new system and see what happens

  • [ ] Check that things that should be covered are covered

gpsaggese avatar Jun 19 '25 14:06 gpsaggese