lcov icon indicating copy to clipboard operation
lcov copied to clipboard

Differential coverage, date and owner binning - plus a few minor enhancements and refactoring (was "Diffcov initial")

Open henry2cox opened this issue 4 years ago • 18 comments

This request contains an implementation of differential coverage + date- and owner- binning. A paper which describes the approach and the development methodology that they enable can be found at https://arxiv.org/abs/2008.07947. The basic idea is to take advantage of history to identify un-exercised code which has been recently added or changed, as well as unchanged code which is no longer exercised. The two features - differential and binning - are orthogonal in that you can use either, both, or neither. With neither feature enabled, the result is the same as 'vanilla lcov' that we see today. The new features were added to the 'gendiffcov' script - which is a fork of the original 'genhtml'. I did this initially so that I could run both scripts for testing, and then left it this way. 'genhtml' is a subset of 'gendiffcov' now - so it isn't necessary to keep both. There are some other changes to extract common functionality into a perl module, as well as some 'filtering' to make branch coverage statistics usable in our framework by ignoring branch data for lines which do not seem to have any conditionals.

Additional comment added 23 Sept 2020: A good way to understand the differential coverage and date/owner binning features is to run the testcases. Clone repo and unpack cd .../lcov/tests/gendiffcov/simple make now open the 'index.html' file in each of the generated subdirectories. They test the various combinations of differential vs legacy coverage, binning or not - as well as some options to control which data is displayed. Note that all these tests use the same tiny example from the above paper - so features related to multiple source directories and multiple source files per directory - are not evident.

henry2cox avatar Jul 31 '20 10:07 henry2cox

Thank you for your contribution to LCOV. Differential coverage analysis support has been a long-standing TODO for LCOV and from a first look, your implementation seems very feature-rich.

That said, I do think that this patch set will need a lot of work before it can be considered for inclusion into the LCOV mainline code. I think the effort would be worth it though, both for your internal users who would gain potential improvements to usability and stability and also long-term support via the upstream integration, but also for LCOV users who can then make use of this new functionality.

I can offer to support you with my review feedback. Below I've listed some areas that need changes.

1. Split up commits

The amount of code contained in your patch set makes it very hard to handle from a reviewer's point of view. Please split it up into smaller chunks, where each one relates to one functional aspect.

Here are some functional aspects that deserve separate commits:

  • author binning
  • date binning
  • differential coverage
  • filtering

Additional commits may be required for staging (e.g. for splitting out the renaming required when introducing package concepts), or introducing prerequisite functionality.

2. Integration

Your current implementation duplicates a lot of genhtml and genpng code, and unnecessarily introduces new executables for functionality that can be integrated into existing tools. I would suggest to merge the gendiffcov tool into genhtml, and the gendiffpng tool into genpng.

I would also ask you to try to minimize the changes to the existing logic as much as possible. This will help keep the code readable and maintainable, and reduce the chance of introducing regressions into the existing functionality.

As an example take function write_source_line() in gendiffcov/genhtml: gendiffcov generates the same type of output line, with some new and some changed components. Your implementation looks like a major rewrite of this function, which makes it difficult for readers to relate the new lines to anything done in previous commits, and which increases the chance for new bugs.

A better approach would be to add additional calls to new subroutines that encapsulate the logic to generate the new pieces into the existing function logic. The original logic should be kept intact when possible.

3. Coding style

I understand that the coding style used in large portions of LCOV is somewhat special but mixing different styles makes maintaining code an unnecessarily complex task.

Here are some rules that your code should follow:

  • use words_separated_by_underscore instead of camelCase
  • use tab indentation - a tab expands to 8 blanks
  • if the indentation makes your code move too far to the right, this is an indication that the code is too complex and should be split up into multiple functions
  • put opening braces for subroutines on the next line after the declaration
  • function prototypes should declare parameters (sub func($$))
  • add comments to function declarations that describe their purpose, and if they work on complex datastructures (e.g. hashes) then also describe their layout
  • add comments before complex code blocks
  • add newlines for readability after blocks of variable declarations, blocks of related code, etc.
  • remove all non-functionality whitespace changes

Also each commit must be accompanied with a descriptive commit message including a Signed-off-by line. See also Contribution guidelines

4. Usability

I find it very difficult to memorize the TLAs your implementation assigns to each class of coverage state change, and to decipher their meaning. Also the color coding seems to add very little extra information.

I'd like to suggest to label the lines, functions, branches in a different way that can be more easily deciphered intuitively. Instead of giving a hard-to-memorize TLA to each state change, why not spell it out?

As an example, the state of a line in a coverage set according to what I think the TLAs mean can be either one of the following:

  • Does not exist
  • Exists but is not covered
  • Exists and is covered
  • Line is excluded (though I'm not sure how you detect excluded lines)

You could simply assign an easy to memorize symbol to each of these states, e.g.:

  • #: Does not exist
  • 0: Exists but is not covered
  • 1: Exists and is covered
  • X: Line is excluded

Then you can represent the state changes from base to current via a transition such as base -> current. This would give the following TLA mapping:

  • Coverage rate changes

    • UBC: 0->0
    • GBC: 0->1
    • LBC: 1->0
    • CBC: 1->1
  • Insertions

    • UNC: #->0
    • GNC: #->1
  • Deletions

    • DUB: 0->#
    • DCB: 1->#
  • Removed exclusions

    • UIC: X->0
    • GIC: X->1
  • Added exclusions

    • EUB: 0->X
    • ECB: 1->X

A similar, abbreviated version could be used for representing branch state changes such as: +- ++ -- +# #+ +X

Regarding color coding I think it would make the most sense to use colors only to indicate that an element needs to be looked at, in order of priority:

  • red: anything that changed to 0 with the exception of 0->0
  • yellow: 0->0
  • blue: anything else

5. Miscellaneous

Other observations/feedback:

  • Is there really a need for DateTime/Format/W3CDTF.pm? Requiring new modules always causes fall-out with users
  • Remove debugging outputs and comments
  • --show-details seems to be deactivated in gendiffcov
  • Do not change the line coloring when no base-file is specified
  • Create an lcov directory in $LIB_DIR when installing and put all lcov library installables there
  • Use a plug-in concept for the annotation scripts, e.g. --annotate git => uses script in /usr/lib/lcov/annotate/git
  • Scripts should check any non-standard tooling that is required to function (e.g. p4) and abort with a dedicated error message if those are missing
  • gendiffcov emits warnings when combining -s or --legend with branch coverage
  • Add more tests

oberpar avatar Aug 11 '20 15:08 oberpar

Hi –

Thanks for the feedback. I had a few questions and a couple of comments.

My first and most basic comment is that I’m not wedded to this particular implementation – and would be quite happy to see someone else pick up the code and redo it in a better/cleaner way. I do want to maintain the various features in any such rewrite, though – as my internal customers are using all of them. Not all groups use all the features – but all the features are used by at least one group – so someone would be unhappy, if functionality went away.

My second comment is that yes – that is exactly the reason that this pull request exists. I would like the see the features generally available – so I don’t have to maintain my own versions, internally. I also think that this is a pretty generally useful idea – particularly for large-scale projects with a lot of code and many developers. The various features and tabular summaries make it relatively easy for developers to find and fix holes in their tests, and makes it quite easy for the project lead or program manager to keep track of status, even without knowing the details of the software architecture and physical structure of the code base, or even details of who is working on what – or who had worked on what, 18 months ago.

I think I mentioned that we use this same tool for both software (C/C++, java) and hardware (Verilog/SystemVerilog).

More detailed questions/comments now:

With respect to splitting up the commits:

  •      I fear that I am not sure how to do that – except possibly by restarting with a clean repo and then re-implementing each part, piece by piece.  Is there a better way?
    
  •      The other problem is that the various pieces (apart from filtering) all work together:  date binning knows how to deal with differential coverage, etc.  It isn’t quite possible to split the code to have date binning independently, then differential coverage independently, etc.
    
  •      From a historic perspective:  my development order was actually:
    

o Refactor a bit to use packages – to make subsequent change easier, then

o Implement differential coverage categorization (for line coverage), then

o Implement date and owner binning (for line coverage) on top of that, then

o Implement branch coverage (differential + binning), then

o Implement filtering when it became clear that branch coverage was unusable without it.

  •      It isn’t spectacularly difficult to recreate a commit order something like the above – but it is quite difficult or impossible to have much finer granularity which is still complete and usable at every increment
    
  •      I’m reluctant to try to do this, without some more detailed guidance about what the final result would look like, if this task was done perfectly.
    

With respect to replication of genhtml/genpng and gendiffcov/gendiffpng: I agree. It isn’t necessary to have both of them – and it is relatively straightforward to fold the new functionality into the existing scripts. That may involve extensive rewrite of some methods; hopefully, we have sufficient tests to catch any regressions that are introduced.

With respect to coding style: I think that this is a religious issue – and it is never a good idea to get into a religious debate. I can attempt to comply – but I we need an emacs mode and a lint checker/reformatter, if we want to be sure.

With respect to differential categories: “TLA” stands for “Three Letter Acronym” – and is borrowed somewhat tongue-in-cheek from the Purify tool suite, from the early 90s. My experience with those tools – and my current experience with the differential coverage tools – is that the acronyms become self-documenting rather quickly. They correspond to meaningful phrases in English, and re-use the same letters to mean the same thing in each acronym. We tend to use the acronyms (quite a bit) in tables (as opposed to source code detail views). In the various HTML views, TLA entries typically have a ‘title’ popup attribute, which expands to long name.

With respect to exclusions and deletions:

  •      “excluded” means that there is no code on that line in ‘current’ (ie., no count – not even a zero count) – but there was in the ‘baseline’ – and the line itself has not changed (identical text in baseline and current – no entry in the ‘diff’ file)
    

o There are two easy ways to get excluded code: change a #define to exclude something (or add a block comment around it), or change template/inline usage such that the code is no longer instantiated.

o The converse “included” categories are similar: the source text has not changed, but some lines which were not code in “baseline” now appear as code in current (possibly with a zero count).

§ The zero-count issue is the reason for the “--filter line” feature. It appears that GCC will sometimes count the closing brace of a function/method (either hit or not hit) and sometimes not – and that this can change even when the function text has not changed. The result of this issue is that we sometimes see non-zero “UIC” counts associated with these lines – which causes our Jenkins job to go ‘yellow’ – but which are gcc artifacts that we don’t care about.

  •      “deleted” code is related – but different.
    

In this case: there was code on the line in ‘baseline’ – but the line has disappeared and is no longer present in ‘current’. Note that I don’t try to distinguish “changed” code. A “changed” line will correspond to a “deleted”/”inserted” pair – and the fidelity of delete/inserted is at the whim of the tool used to generate the ‘diff’ file. (Anecdotally: ‘git’ is better at producing minimal diff files than ‘p4’)

Color coding is user-configurable (and can be easily changed in the CSS file); The idea was to use red/orange/yellow-ish hues to indicate missing coverage, green-ish for covered code, and something else to show exclusions. In tables, we colorize only the entries we think that users (or managers) should look more closely at.

henry2cox avatar Aug 11 '20 17:08 henry2cox

When fixing an issue that one of my users discovered, I noticed that I had forgotten to add/push one of the key files needed by the 'test/gendiffcov/simple' testcase (to exercise the 'annotate' code, without requiring an actual P4 or git repo).

henry2cox avatar Aug 13 '20 10:08 henry2cox

I just submitted some more changes and fixes. In the process, I discovered that I had somehow failed to submit some lines in some files - not sure how that happened). I checked that things are complete (in several different ways) this time.

With respect to the earlier comments:

Is there really a need for DateTime/Format/W3CDTF.pm? Requiring new modules always causes fall-out with users

I use the W3CDTF package to parse timestamps returned from 'git' and 'p4' in the annotate script. It is possible to remove this use from the gendiffcov (genhtml) source - but then we push the burden to the user to handle the common case of translation from revision control date format to age-in-days. I think it is more reasonable to put that common functionality in a central location. It IS possible to selectively load the module so as to not error out if the module is missing, unless it is actually required by the annotate feature (that is: do not emit an error until we need the package and cannot find it).

--show-details seems to be deactivated in gendiffcov

Fixed. '--show-details' also puts entries into the corresponding TLA column for 'hit' lines or branches (i.e., the GNC, GIC, GBC and CBC columns).

Do not change the line coloring when no base-file is specified

Fixed.

Use a plug-in concept for the annotation scripts, e.g. --annotate git => uses script in /usr/lib/lcov/annotate/git

I do not think I understand this request. Could you elaborate.

Scripts should check any non-standard tooling that is required to function (e.g. p4) and abort with a dedicated error message if those are missing

The 'git' or 'p4' or other revision control system interface is called from within the user's 'annotate' script. I provide a couple of samples - but the expectation is that many users will have to write their own (using these as a model). There is no way for me to know what other dependencies the user script may have. The sample scripts only work in very simple build environments where the revision control layout and the build layout are identical (no renaming or links); my experience is that almost all non-trivial projects are more complicated than that.

gendiffcov emits warnings when combining -s or --legend with branch coverage

I could not reproduce the issues. The option seems to work in my sandbox.

Add more tests

Is there an interface that we can use, to generate .info files for the perl tests? That way, we could run the testsuite - and check the differential coverage :-) (We do exactly this for Java, internally - but I haven't done anything about perl, yet. My intent is to add the Java support scripts to the 'scripts' library directory, though.

I hope this helps.

Henry

henry2cox avatar Aug 14 '20 19:08 henry2cox

Just pushed some additional functionality. Now generating 'project summary tables' using date or owner bin as the primary key. Tables are created at both the top-level and directory level. This feature is quite useful in project status or or project review meetings - to see what has been worked on lately, who is working on what, and whether coverage holes remain which need to be addressed before the next release. (This feature was requested by one of my internal users - and has turned out to be quite popular with the others.)

henry2cox avatar Aug 29 '20 21:08 henry2cox

...and pushed another set of changes. This contains some changes required to support Verilog expression coverage (note that the Verilog interface code is not upstreamed, yet. We need some explicit permissions from Synopsys, Cadence, and Mentor - as well as approval from our upper management.) I also added more consistency and error checking, especially to the 'diff' and 'annotate' interfaces - mainly driven by some common mistakes and problems that I have seen as new teams adopt the tool. Finally, there are a few bug fixes.

henry2cox avatar Sep 11 '20 10:09 henry2cox

As above: just pushed another enhancement, to support hierarchical HTML report (following directory structure of source code). This addresses issue #97 that I filed earlier this week

henry2cox avatar Oct 03 '20 11:10 henry2cox

The above commit implements the feature (workaround) described in issue #98 that I filed a day or two ago.

henry2cox avatar Oct 08 '20 10:10 henry2cox

Pushed support for differential categorization of function coverage metrics (along with few bug fixes and some minor refactoring).

henry2cox avatar Oct 28 '20 14:10 henry2cox

Sorry for not getting back to you on this PR sooner. I'm still hopeful that I'll be able to help with getting this merged, but given the size of changes this will need significant focused time that I'm still trying to find.. 😓

oberpar avatar Jul 21 '21 13:07 oberpar

No worries – I understand “busy”.

From my company’s perspective: we are using these features (in anger) on a number of software/firmware and hardware projects. As such – somebody is going to support and maintain the tools, regardless of the rest of the world. From my perspective: I would far rather have this upstreamed and merged to master so that other people can use it and enhance it. We can all benefit – and my support cost is reduced or eliminated.

At this point, the changes fall into 3 categories (but all 3 are closely coupled and mostly rather difficult to separate):

  •      Major features:
    

o Differential coverage categorization, date binning

§ These are both described in the paper referenced in the updated README. There is also a recording on zenodo.

o Coverpoint filtering

§ Some kind of ugly hacks to ‘fixup’ some gcov artifacts which otherwise make certain lcov metrics unusable/useless/not actionable

§ Support for merging function name aliases (e.g., due to inlining)

§ (one can argue that this is not a “major feature” or substantial change

  •      Minor features/reporting changes or enhancements/bug fixes
    

o Quite a lot of these, at this point - for example

§ to allow user to override ‘header’ and ‘footer’ text on HTML pages: this real estate is too valuable to waste with uninformative text. We put build IDs, project names, contact names, etc. in these locations.

§ Stable sorting for function names in coverage reports

§ … and so on

  •      Substantial refactoring:
    

o Where I could, I moved duplicated utility code from genhml/lcov/geninfo to a (new) ‘lcovutil.pm’ library.

§ This simplifies ongoing support and maintenance – especially where new features (such as filtering) interact with utilities.

If you want to schedule some time to walk through the code and the code changes together – please let me know. I’m happy to do so. It might be faster that way than working through line-by-line on your own.

Henry

henry2cox avatar Jul 21 '21 15:07 henry2cox

I thought it might be helpful to summarize the current set of features implemented in this branch. These features have been in use across several software, firmware, and hardware projects for more than 2 years. The fall into a few categories

Coverage-related features:

  • Differential coverage (see paper at https://ieeexplore.ieee.org/document/9438597)
  • Date/owner binning (see paper)
  • Related: various “sorted” views and associated hyperlinks to aid comprehension.

Visualization-related features:

  • Annotated source view and summaries – see sample testcases Currently integrated with git and perforce (annotations also used for date/owner binning)
  • Added support for “hierarchical” reporting
  • (Optional) Merge function aliases into parent group.
  • Improved sort stability – e.g., between tools executions over slightly modified data
  • Various features for control of page and table titles, tooltips, etc.

Automation-related features

  • User-specified ‘coverage criteria’ - so that coverage results can be used as a regression test pass/fail criteria (discussed in paper, above).

Other features:

  • “Filtering” - to remove/ignore certain misleading artifacts in coverage data (see man pages for more detail)
  • Error message categorization and suppression (see man pages)
  • Added support for certain features found in RTL code (hardware language)
  • File version and/or signature verification during data merge and data presentation Useful/necessary in order to verify that the data to be merged refers to the same source (or not). This is particularly useful in an enterprise environment when component coverage data is merged across multiple projects.
  • Add support for parallelism – to speed up processing for larger projects.

Other code changes:

  • Refactored to move common functions and duplicated code into a library.

At this point, all of the above features are an interdependent unit. It is not straightforward to accept the “annotation” features while rejecting the “common library” refactoring. It is not straightforward to accept “date/owner binning” without also accepting “differential coverage” – and so forth.

However, it is also true that the various new features are orthogonal: you can use any combination independently. It is also true that the new features are backward compatible with ‘base lcov’: if you want to use only “line coverage” with no annotations: you can do so – and the generated reports will look the same.

henry2cox avatar Apr 28 '22 11:04 henry2cox

I'd like to finally merge this patch set, but the current list of commits is very complex and I'd like to keep the merged commit history simple, so here's how I think I could handle this:

  1. Merge all patches that introduce new function together (including bugfixes for newly introduced code). Is there any set of patches that you want to keep separate?
  2. Keep fixes that are unrelated to new code separate
  3. Add more verbose commit messages to the new function patches
  4. Fix author e-mail addresses and add signed-off-by lines for your patches with address Henry Cox <[email protected]>

If you agree with this approach I will perform all of the steps above and make the result available for your review before actually pushing it to the master branch. The resulting series will include all of the code in this PR.

Since you used a different coding style for some code (e.g. indentation depth), I might also apply a follow-on coding-style patch that brings the code to a consistent type.

oberpar avatar May 03 '22 14:05 oberpar

Good to hear from you – and nice to hear about the possibility of merging. 😊

I think your plan sounds good – and I’m happy to review whatever you did (I just don’t want to reformat/separate/etc myself).

Once a new version is available, I will download it and try it in our development environment (i.e., on some of the projects that actually use the various features, in anger). While the unit tests do check some basic behaviour – they are far from complete (yeah – this is bad…kind of inevitable…but bad). IMO: it would be great if we could do a bit more to integrate the perl coverage tool – so we knew what we have actually exercised and not. (Even better, to make that report use differential coverage and binning too. On my list – but not done yet).

Note that I just pushed another small patch (to apply multiple pathname substitution regexps, in geninfo/lcov/genhtml). It turns out that all but one project that I’m aware of had required ‘sed’ post-processing on the generated data files – and this enhancement eliminated that step. One could argue that this slightly decreased the complexity of the flow (…fewer steps and fewer intermediates) at the cost of decreased debuggability/increased complexity of individual steps.

With respect to patches: I don’t think that any of them need to be separated – especially since separation might be somewhat complicated and/or error prone. Happy to live with your judgment call.

Henry

henry2cox avatar May 03 '22 15:05 henry2cox

To give an update on my progress - with the amount of changes, rebases and integration of other pull requests, the commit log on your branch grew extremely complex. I managed to create a linear history again by cherry-picking from your branch. The Next step will be squashing all of that together. That said, I still hope that I can extract some of the changes that are not directly tied to your diffcov implementation into separate commits.

oberpar avatar May 06 '22 14:05 oberpar

FWIW: the parallelization changes I just submitted improve performance on a multicore server approximately as:

  • genhtml: 3.5X - seems to saturate after about 8-10 cores. Note that that perforce is the bottleneck in our infrastructure. YMMV.
  • geninfo: Near linear up to about 16 cores. (i.e., about 12-14X for a 24-core machine) Note that this change causes geninfo to merge data as it collects (not to simply append) - which vastly reduces output file size at the cost of some CPU time - while also reducing parse time for subsequent operations.
  • lcov: (for merge-with-filter type operations): about 8X for 16 cores, not much improvement after that :-(

Your project, network, and filesystem may be different.

The submission adds a "--profile" flag which will cause some gross level timing information to be saved. The intent is to use it to guide further optimizations (eventually).

henry2cox avatar May 30 '22 11:05 henry2cox

I've split out two commits from your branch and added them to master:

  • 05cfc7e86d3f ("genhtml: Add option to change HTML footer text")
  • 1eb18f1deb50 ("genhtml: Add option to change HTML header text")

The current HEAD at 034353207a52 will be the base for the upcoming v1.16 release. After that I plan to squash your series as discussed previously and rebase that onto v1.16. I'll make the result + coding style changes available for your review before the final merge.

oberpar avatar Jun 02 '22 16:06 oberpar

Please merge. We need it! Rsrsrs.

paulocoutinhox avatar Aug 10 '22 15:08 paulocoutinhox

Thank you for your work on this, I've just stumbled over the need to "compare" two results and this seems to solved the problem.

GitMensch avatar Oct 18 '22 21:10 GitMensch

I'm not sure that this is related to your PR, but genhtml provide invalid HTML:

<td width="5%" class="headerCovTableHead" title="Uncovered New Code""><span class="tlaUNC">UNC</span></td>

^ second " after title

al-babych-fivetran avatar Nov 01 '22 16:11 al-babych-fivetran

Yeah…that’s me. Thanks for finding the bug. Will upstream a fix shortly.

Henry

3 Nov 2022: pushed a fix for the format issue, plus some features and debug aids for resource-constrained compute farm execution.

henry2cox avatar Nov 01 '22 16:11 henry2cox

Closing this P$ in favor of newly created #169 - which contains all the same code, but a single flattened commit.

henry2cox avatar Nov 10 '22 11:11 henry2cox