grass [Help needed] Adjusting build for code coverage

[Help needed] Adjusting build for code coverage

Open echoix opened this issue 6 months ago • 3 comments

I'm writing here to try getting a bit of help.

I've been working for the last two-three weeks or so to bring up some code coverage for the repo. The integration works quite well, and I was nearing the end of it, so I was cross checking that the results were trustworthy. What I observed was that for a no-code-change commit (changing a space in a comment of a workflow file, unrelated to the builds), code coverage changes (ups and down, but not on code that is tested directly). I think I finally understood the issue, but it's the solution I can't get right yet.

So basically, heres how getting code coverage works, and the integration to our repo as I got it now:

Using the codecov service to display the various results, history through time, interacting with PRs, showing differences, etc. I even reworked the CI part last weekend since their beta v4 uploader action was finally released. To have some coverage to upload, we need to generate these files: thats the following points.
For pytest, use the pytest-cov, a plugin that integrates and configures coverage.py. The plugin helps to correctly measure Python code that is called through multiprocessing or subprocesses, and provides summary info at the end. It works quite well, but not a lot is covered with pytest.
For gunittest, use coverage.py directly. To use it at a basic level, it is simply equivalent to replace a call to python with coverage run. But in our usage, it is not enough. Since we have some code installed in $HOME/install (in the Ubuntu workflow), and it isn't really inside the source code tree, but the test code remains in the source tree (and is the current working directory), there's some missing hits. I've got most of it worked out on this, it's not that much of an issue.
Still for gunittest, there's the fact that there's a lot of calls using subprocesses, either for c-based code, and even Python based code. I had to configure some workarounds as explained in the coverage.py docs, in order for the Python instances launched as subprocesses to hook up to coverage.py. It includes (through some env vars refered in the config file) handling the fact that the cwd directory inside the subprocesses can be different, and having the link back to the source code is tricky. Until I solve my other blockers and I prove myself otherwise, I think it is mostly solved.
Still for gunittest, usage of multiprocessing has some similar problems, but are solved by the same configuration made for subprocesses, so no extra to do.
That's it for Python for now. - But grass isn't a Python-only repo, it has C and C++ as a large part of the code base, and the backing library. There's also C/C++ coverage to get, and it gets tested with tests written in Python (or even shell) that also test Python code simultaneously.
That's where using gcov (part of gcc) comes into play. We compile the code with some extra flags to add profiling support, counting arcs, reducing or disabling optimizations (so coverage still matches the code), and a linker flag too. On compilation, a *.gcno note file is placed next to every object file. Roughly, it contains the flowcharts and line numbers (in reality, only the information needed to create it from the object files). These object files can be linked together, and the libgcov, and we have our instrumented program or library. When running counters are incremented and when the process ends, a *.gcda data file is created (by the extra code added to our program), that contains the counters/statistics. If the file already exists, like when that program is called a second time, the statistics are added to the same file, it is not replaced.
Once finished running what we want, we then use the gcov tool, that takes a *.gcda and *.gcno file, with the object file, and creates a *.*.gcov (for a source file, can be source or header file). These .gcov files are the real coverage report files. The codecov action takes care of finding all of these and running gcov with the correct options and fixing paths for the CI environments to match the repo in a cross computer way.
Now we are reaching the tricky parts...
If two instances of the program are running, and then terminates and try to write to the same .gcda file, what happens? Seems like only one of them wins (from my observations and some Stack overflow). The two solutions for this is either : a) to use two env var, one with a prefix to add to the path of .gcda generated (GCOV_PREFIX) and an env var to strip levels of directories of an object's path before applying the prefix (GCOV_PREFIX_STRIP). That makes it possible for a /user/build/foo.o to create a /target/run/build/foo.gcda. https://gcc.gnu.org/onlinedocs/gcc/Cross-profiling.html A difficulty of using this approach is that our builds create OBJ folders at various depths, and I don't imagine a way that using this would be practical. b) at compile time, use -fprofile-dir= to specify the folder to place the *.gcda files. It evaluates %p as the process id, and %v{VAR} to an env var. https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fprofile-dir I already use the -fprofile-abs-path so that the files know to generate in the original folder wherever the subdirectory it is, and keep a full path through the mangled file name. It was needed for another reason too. A Python tool as a easier wrapper to gcov (that can create reports), gcovr, has some other useful info https://gcovr.com/en/7.0/guide/compiling.html#compiler-options I tried using coverage.%p, but ended up filing up all the runner's disk space for the non-temporal half of the Ubuntu workflow. And all of the jobs took way more time, probably 25-75% more (3 min->9 min/7.5min->16min for pytest, 30.5 -> 32 min / 34 min-> 36min or 34 min -> DNF). For the jobs that succeeded, collecting, creating gcov from data files and uploading coverage took about 1 min 9 sec instead of about ten seconds. Locally, only building and starting the first couple tests created like 3000 folders at the root of the repo and with each about 10-30 gcda files. There were also folders scattered through the subdirectories too, where other OBJ folders were. It's not realistic, since it I suspect that either collecting tests through pytest, or just importing a Python file ends up loading our C-backed library somehow, and so without even starting coverage, there's already many copies of the data files. Yes they can be merged back two-by-two, but I still end up needing to copy back the gcno files with the gcda files, or the opposite, copying back gcda files back next to gcno files so that gcov could work. That is problem number 1.
I tried avoiding temporarily problem 1 by making pytest serial again, but coverage coming from pytest is still unstable. I suspect that even though one test is running at a time, maybe more than one instance of a program/library tries to write and only one of them wins. So until I solve the case for a serial test run, there's no point of trying my idea to use the pytest worker ID (gw1, gw2, gw3...) env var for the name of the profile directory (limiting the number of subfolders needed).
Problem number 2 is related to how we build grass. Remember that a gcno file is created for each object file? When gcda files are written by the instrumented program, it checks that the existing gcda file matches the compilation unit that it was created for (a checksum), such that after a recompilation, an outdated gcda file will be reset on the first start. See some info, then I continue: https://stackoverflow.com/questions/68136230/gcda-file-coverage-info-not-getting-updated-when-two-executables-run-for-same-s https://stackoverflow.com/questions/72985855/libgcov-profiling-error-overwriting-an-existing-profile-data-with-a-different https://stackoverflow.com/questions/71582876/gcov-coverage-limited-to-test-files-in-minimal-g-project/71586011#71586011 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85759 The libcov ˋoverwriting an existing profile data with a different timestamp message appears when building, for example with r.mapcalc/r3.mapcalc. I think that, appart from the possibility that these two targets are made multiple times, that what is happening is that the common files are compiled two times, once for r.mapcalc and another for r3.mapcalc (when make with -j > 1 ). So we end up with a gcno file overwritten by a second different compilation. What should be happening, ideally, is that the common files, ie$(AUTO_OBJS)` with the exclusions of both these lines https://github.com/OSGeo/grass/blob/a1cadc78f3897d95f5f26e2d1c8d05fae4b3671d/raster/r.mapcalc/Makefile#L9-L10 Should be compiled (once), and then the remaining specific files should be compiled, before linking them together in https://github.com/OSGeo/grass/blob/a1cadc78f3897d95f5f26e2d1c8d05fae4b3671d/raster/r.mapcalc/Makefile#L20-L21. I tried for 5-8 hours and didn't end up with something working. There's too much obscure syntax in the make files for me to understand what's going on.

I hope that the way the recursive makes work here, with the dependencies (that I don't understand completely how they work) don't end up recompiling a dozen times the same library files (until one of the targets from a parallel job completes successfully, I could imagine that at the beginning, since recursive makes don't track dependencies well, some early-used targets' dependencies would be launched more than once).

So that's it, I'm stuck with make and don't know how to unblock myself.

Feb 06 '24 03:02 echoix

grass grass copied to clipboard

[Help needed] Adjusting build for code coverage

grass
grass copied to clipboard