rebar3 cover aggregation bug

I'm using both proper and ct on a project and aggregated cover data is weird. Example: I have 95% coverage on a module with proper, but aggregated coverage is smaller than that. When I look into aggregated coverage data, all code that is used is marked green, but there are lines that should be ignored (-specs, fun defs, even comments!) that are marked in red.

I don't know if this is actually tied to rebar3, but I've found nothing mentioning "aggregate" in cover module, so I suspect that this is rebar3 feature.

Debug and diagnostic output gave me nothing, I believe it's algorithm error.

Willing to collaborate on the solution.

Feb 26 '24 14:02 MarkoMin

Cover aggregation is done as follows:

the aggregate analysis is run independent of the other analysis steps for individual tools
all the coverage files are imported. This is done with cover:import/1, which is additive.
The aggregate analysis is performed by calling cover code on a per-module basis
The percentages are aggregated from total covered lines vs. uncovered lines vs. total lines
Once the aggregate parsing is done tool-specific analysis files are analyzed independently

Since the aggregation is done implicitly by loading multiple coverage files at once (as supported by cover) and then running the analysis on them, it's likely that any bug in processing is within the OTP library in Erlang, or through potentially issues in the module or file path tracking of individual coverage tracking step.

The actual tracking of which calls are handled or not is handled by the cover module which compiles modules and tracks data in a database, which we can then flush to files. It's possible there are bugs in how we invoke the module, but if the issues are "which lines are instrumented or not", this is likely to lie within the OTP libraries and not rebar3.

Feb 26 '24 15:02 ferd

Turns out it was my fault that I didn't update all cover data after updating a file. More info in OTP bug.

Would it make sense to throw an error when cover data for a module aren't compatible to be aggregated?

Mar 11 '24 11:03 MarkoMin

What would make them incompatible? Like changing the source files between runs?

Looking at the other ticket, we might want to aggregate cover data with a "last build" time or some hash and be able to maybe warn when we're like "this coverdata may be stale", but since we don't really track them as build artifacts, it'd need to be pretty specific.

Mar 11 '24 14:03 ferd

What would make them incompatible? Like changing the source files between runs?

Yup

If we can get out of this without introducing new state that is saved between rebar3 runs, that would be awesome.

Maybe we could utilize the fact that module compilation is cached. I was thinking something like "If coverdata for module X is generated before X is lastly compiled, warn the user. If there are multiple data for module X and at least one is generated before X is lastly compiled, warn the user".

In the former case, worst you can get is outdated (but correct) cover analysis. In the later result it would also make sense to throw an error, because you can get a result with comment lines or empty lines being marked as used/unused - which is incorrect.

Mar 11 '24 15:03 MarkoMin

Yeah looking at the touch times would work at least, but will make the checks a tiny bit slower. Probably worth it. OTOH you'll have some issues where some files (eg. .hrl files or parse transforms) may change the behavior of .erl files elsewhere without necessarily updating the .beam files' stamps, so the method won't be perfect. But fixing that sort of requires reusing the whole graph the compiler uses to properly track things, and that would be complex.

The heuristic of "is this file newer?" is probably helpful enough.

Mar 11 '24 20:03 ferd

Yeah looking at the touch times would work at least, but will make the checks a tiny bit slower. Probably worth it. OTOH you'll have some issues where some files (eg. .hrl files or parse transforms) may change the behavior of .erl files elsewhere without necessarily updating the .beam files' stamps, so the method won't be perfect. But fixing that sort of requires reusing the whole graph the compiler uses to properly track things, and that would be complex.

It could change the the behavior of .erl files, but one can't end up with coverdata that targets comments, specs or empty lines (at least I can't imagine so). Behavior can also be changed by modifying tests, so I think we can't (simply) guarantee that coverdata represents current state of the codebase, but we may at least solve the issue of non-code lines being used in coverdata reports.

Resulting guarantee would be: "each code line is either used or unused, each non-code line is ignored"

EDIT: prefix guarantee with "if there is no warning: "

Mar 11 '24 21:03 MarkoMin

rebar3 rebar3 copied to clipboard

cover aggregation bug

rebar3
rebar3 copied to clipboard