bazel Implement coverage reports for Starlark

If the new flag --starlark_coverage_report is set to a non-empty value, the Starlark coverage recorder will be enabled for all .bzl files under the output base executed during the Blaze command. At the end of the command, the collected coverage data will be emitted to the file path given as the flag argument in LCOV format, containing function, line, and branch coverage information.

May 30 '22 13:05 fmeum

CC @Wyverald

I generated reports for some rulesets and found the results to be relatively accurate, but this would definitely benefit from more testing (it doesn't come with test cases yet) and polish. I'm also not sure whether it works for repository rules and/or functions executed during the execution phase, e.g. map_each callbacks. I wanted to open a PR to get early feedback and will happily fix bugs and supply tests if this is generally viewed favorably.

May 30 '22 13:05 fmeum

Thanks for the PR, Fabian! I'll leave it to Jon & Sasha to review this. (NB: Jon is OOO until 9 June.)

Also cc @adonovan

May 30 '22 14:05 Wyverald

I fixed some bugs and refactored the branch logic, but will not make further changes.

May 31 '22 08:05 fmeum

Thanks for sending this, but a change this big needs a design discussion before code. What problem are we trying to solve? What design do you propose? How can we make it simpler?

Sorry, I certainly didn't intend to skip proper process here. I wrote this code out of a concrete need to better understand the edge cases of rulesets I maintain and to improve my understanding of the Java Starlark implementation. I just thought about sharing it hoping that it could be useful for others. I am completely open to rewriting this from scratch or scrapping it entirely, depending on what the outcome of the design discussion is.

My initial questions are:

Does this make the interpreter slower? A lot of work has gone into making it as fast as it is (although there is still room for improvement). Benchmarking is time consuming because the JVM is noisy---execution time may vary by 5% or more---making it hard to find regressions down at the 1-2% level.

The way the hooks are currently implemented, I won't be able to dismiss this concern with confidence. As you said, benchmarking doesn't necessarily catch all regressions on such a low level.

I do think that there is a way to implement the coverage hooks without any overhead in the case where coverage isn't being collected:

Make the static CoverageRecorder instance final and populate it based on e.g. a system property starlark.lcov_report_file being set to a non-empty value.
Make --starlark_coverage_profile a startup option that sets this system property for the Bazel server JVM.
Add logic that dumps the report on server shutdown.

The static final field should be inlined by the JIT - if we want to be certain, we could also make it a simple boolean that triggers a coverage slow path if true. I could then double-check that the code emitted by the JIT doesn't change with the PR merged.

Do you like this approach better?

Do we need all this new machinery? In principle, a coverage tool could be implemented as a debugger that simply steps through the whole program one instruction at a time recording the current pc. We already have a primitive debugger. Can its hooks be used to record coverage, with minimal changes?

After a first read of the code, I'm inclined to say this would work for statement coverage, but not for more fine-grained notions of coverage.

Do we need branch coverage? I notice the tool records control jumps such as break statements and the short-circuit in if true || true {, and information about individual parameters. Is this necessary? Statement coverage is good enough for most purposes and is less invasive.

I agree that statement coverage would already be quite useful. However, given the general conciseness of Starlark (especially with list/dict comprehensions and conditional expressions), a single statement can encode rather complex logic. With the rulesets I have tried this on, statement coverage has almost always been nearing 100%, but branch coverage revealed some interesting uncovered cases (e.g., handling of workspace vs. external repository, checks for missing/superfluous files). While branch coverage information for breaks may not be as useful, I found it better to cover all branches rather than just some of them.

Assuming we find a way to ensure that coverage support introduces no overhead at all (as possibly described above), would you still prefer to remove some or all of the branch coverage hooks?.

Jun 02 '22 12:06 fmeum

@adonovan Would you like me to prototype the static final approach and take a look at the JIT output? I think that this has a fair chance of alleviating the performance worries entirely.

Jul 17 '22 16:07 fmeum

Finally taking a look at this, as I promised Fabian back at BazelCon.

It looks like there's quite a bit more code to this PR than I originally thought. I'm interested in supporting more tooling for Starlark in the new year, coverage included, but in light of

the added code complexity (who will maintain all this extra logic?),
concerns of performance regression, and
uncertainty about the extent of the benefit of this feature to users, and the relative benefits of expression granularity vs statement granularity,

it's clear that a quick acceptance of this PR is off the table. I'd love to have a more involved discussion centered around the benefit vs maintainability tradeoff in Q1. I'll keep the PR open for now.

Dec 19 '22 18:12 brandjon

@brandjon @tetromino Friendly ping on this.

I could trim this down to just line coverage and ensure that there is no runtime overhead if coverage is disabled.

Mar 05 '24 12:03 fmeum