nvc icon indicating copy to clipboard operation
nvc copied to clipboard

WIP: Code coverage

Open Blebowski opened this issue 3 years ago • 1 comments
trafficstars

Hello,

this is first PR with reworked code coverage, it is not yet to be merged, but ready for review.

Code coverage is now split into: Statement, Branch and Toggle coverage. Each can be enabled separately. After simulation, coverage is dumped into "covdb" file.

There is a new command -c which allows merging multiple "covdb" files into single one, and generate report from the merged file. Coverage report is generated "per-hierarchy" and allows navigating through the design.

Currently, tests for coverage were removed, since the new coverage implementation significantly changes the original implementation.

TBD before merge:

  • Review and processing of all comments
  • Fixing a bug with toggle coverage on "out" port of sub-instances -> so far crashes, I don't know why.
  • Extending HTML documentation with new command options.

TBD after merge (further improvements):

  • Support merging when the hierarchy path is not exactly matching any tag, place the tag to correct location in hierarchy.
  • Encode tags in coverage DB file "differentially" (don't store whole hierarchy path) to reduce size of covdb file.
  • Enhance Coverage report. When clicking on each coverage element, display source file where it is (in side panel) and highlight the statement/branch/signal.
  • Add "Coverage waiver file" which allows excluding statements/branches/toggles.
  • Add option to enable coverage dumping only on some part of the hierarchy (the same as wave-opt).
  • Add option to count X,Z -> 0/1 as valid transitions for toggle coverage
  • Add limit on size of signals for which toggle coverage is measured (to reduce performance impact). Typically good to avoid measuring toggle coverage on large memories.
  • Add "Expression" coverage as next coverage type. This will track combinations of input operands on logic expressions such as AND, OR, XOR, etc...

Blebowski avatar Sep 18 '22 17:09 Blebowski

Toggle coverage bug:

====================================================

** Note: initialising [10ms +23412kB]
** Note: loading top-level unit [7ms +5100kB]
** Note: elaborating design [3ms +516kB]
** Note: generating intermediate code [2ms +0kB]
** Note: dumping coverage data [1ms +320kB]
** Note: saving library [0ms +0kB]
** Note: wrote LLVM IR to /DOKUMENTY/Projekty/nvc/cover_debug/sha-512-eval/build/spect_tb/_SPECT_TB.TB_TOP.elab.1.initial.ll
** Note: wrote LLVM IR to /DOKUMENTY/Projekty/nvc/cover_debug/sha-512-eval/build/spect_tb/_SPECT_TB.TB_TOP.elab.0.initial.ll
** Note: wrote LLVM IR to /DOKUMENTY/Projekty/nvc/cover_debug/sha-512-eval/build/spect_tb/_SPECT_TB.TB_TOP.elab.0.final.ll
** Note: wrote LLVM IR to /DOKUMENTY/Projekty/nvc/cover_debug/sha-512-eval/build/spect_tb/_SPECT_TB.TB_TOP.elab.1.final.ll
** Note: code generation for 39 units using 2 threads [118ms +24936kB]
** Note: linking shared library [1ms +0kB]
nvc: ../src/rt/model.c:2965: x_map_signal: Assertion `dst_n->net == NULL' failed.

*** Caught signal 6 (SIGABRT) ***

[0x55d6dd02dad0] ../src/util.c:805 signal_handler
[0x7f715bee741f] (/usr/lib/x86_64-linux-gnu/libpthread-2.31.so) 
[0x7f715bd2400b] (/usr/lib/x86_64-linux-gnu/libc-2.31.so) ../sysdeps/unix/sysv/linux/raise.c:51 raise
[0x7f715bd03858] (/usr/lib/x86_64-linux-gnu/libc-2.31.so) /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79 abort
[0x7f715bd03728] (/usr/lib/x86_64-linux-gnu/libc-2.31.so) /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:92 __assert_fail_base.cold
[0x7f715bd14fd5] (/usr/lib/x86_64-linux-gnu/libc-2.31.so) /build/glibc-SzIz7B/glibc-2.31/assert/assert.c:101 __assert_fail
[0x55d6dd0bc805] ../src/rt/model.c:2965 x_map_signal
[0x7f71607938f8] ../rtl/dummy_module.vhd:27 SPECT_TB.TB_TOP.DUMMY_MODULE_INST.DUMMY_MODULE_NESTED_INST [VHDL]

when an instance with "out" port of std_logic type is instantiated, and toggle coverage is enabled.

Blebowski avatar Sep 18 '22 18:09 Blebowski

This is a big patch, it's going to take some time to go through fully. Do you mind if I create some PRs for the code-coverage branch in your fork? I think that will be the quickest way to make progress.

nickg avatar Sep 19 '22 08:09 nickg

Hi, no problem, I don't expect to get it merged right away anyway (see the bug above which occurs when signal in port map is not directly mapped to signal in above scope), so please feel-free to go ahead and create some merge requests in the forked branch.

As for the functionality, this is something I am more-less used to in commercial simulators. The format of the output report is really "raw" as I think for digital designers / verification guys, fancy CSS classes does not really matter in the end.

Also, I thought it would be better to first merge this type of "first working version", and then go-on and start adding new features (see the list of "what to add after merge") and regression tests.

Blebowski avatar Sep 19 '22 10:09 Blebowski

As well as the HTML report, could you also add a coverage sub-command like --summary that prints a text summary? That will make it much easier to do some basic testing by diffing the output, similar to the existing cover1.vhd that got deleted. It could print the percentage branch/statement coverage for each file/instance, or even just the percentage for the whole design like the old one did.

nickg avatar Sep 19 '22 20:09 nickg

As well as the HTML report, could you also add a coverage sub-command like --summary that prints a text summary? That will make it much easier to do some basic testing by diffing the output, similar to the existing cover1.vhd that got deleted. It could print the percentage branch/statement coverage for each file/instance, or even just the percentage for the whole design like the old one did.

Actually, if you run the -c with -V (the same kind of verbose as elab), you will get more verbose info. E.g. in my debug design:

** Note: initialising [13ms +22340kB]
** Note: Loading input coverage database: cov_dbs/tb_top_G0.covdb [0ms +0kB]
** Note: Loading input coverage database: cov_dbs/tb_top_G1.covdb [0ms +1156kB]
** Note: Loading input coverage database: cov_dbs/tb_top_G2.covdb [1ms +116kB]
** Note: Saving merged coverage database to: cov_dbs/tb_top_merged.covdb [2ms +8kB]
** Note: Generating coverage report to folder: coverage_output. [0ms +0kB]
** Note: Code coverage results on: SPECT_TB.TB_TOP
** Note:      Statement:  87.5 %
** Note:      Branch:     68.8 %
** Note:      Toggle:     8.8 %

I have used the progress as you are using for elab. Right now the coverage merging time is negligible, but with more complex designs, it might take non-trivial time, and I would like to have some option to "benchmark" it.

Blebowski avatar Sep 20 '22 20:09 Blebowski

OK, @nickg , now the thing is ready for review. I managed to elaborate and run bit of my fairly complex design (CTU CAN FD IP core - so far I tried only RTL top, I don't have the whole TB ported to NVC) with all coverage types enabled. Btw, the performance impact is immediately seen:

Without coverage:

** Note: initialising [8ms +23460kB]
** Note: loading top-level unit [12ms +7264kB]
** Note: elaborating design [450ms +35640kB]
** Note: generating intermediate code [320ms +29352kB]
** Note: saving library [22ms +264kB]
** Note: code generation for 5144 units using 7 threads [25096ms +76316kB]
** Note: linking shared library [7ms +0kB]

With all coverage types enabled:

Elaborating:
** Note: initialising [9ms +23404kB]
** Note: loading top-level unit [11ms +7328kB]
** Note: elaborating design [454ms +35644kB]
** Note: generating intermediate code [343ms +36800kB]
** Note: dumping coverage data [20ms +264kB]
** Note: saving library [28ms +0kB]
** Note: code generation for 5144 units using 7 threads [32820ms +84024kB]

But thats something what is expected.

Blebowski avatar Sep 30 '22 19:09 Blebowski

** Note: code generation for 5144 units using 7 threads [32820ms +84024kB]

OK, it doesn't seem too bad. The slow code generation with LLVM is something I'm working on separately anyway. Do you know which of statement/branch/toggle coverage has the largest effect?

nickg avatar Oct 05 '22 20:10 nickg

I also expected worse, but its only elaboration, I would be curious about the simulation.

Individual coverage types have following results:

No coverage:

Elaborating:
** Note: initialising [16ms +23244kB]
** Note: loading top-level unit [15ms +7204kB]
** Note: elaborating design [446ms +35880kB]
** Note: generating intermediate code [236ms +29228kB]
** Note: saving library [26ms +0kB]
** Note: code generation for 5144 units [24592ms +82592kB]
** Note: linking shared library [18ms +0kB]

Statement only:

Elaborating:
** Note: initialising [14ms +23632kB]
** Note: loading top-level unit [16ms +7140kB]
** Note: elaborating design [468ms +35868kB]
** Note: generating intermediate code [256ms +31336kB]
** Note: dumping coverage data [12ms +264kB]
** Note: saving library [26ms +0kB]
** Note: code generation for 5144 units [29448ms +83380kB]
** Note: linking shared library [22ms +0kB]

Branch only:

Elaborating:
** Note: initialising [15ms +23284kB]
** Note: loading top-level unit [13ms +7268kB]
** Note: elaborating design [429ms +35836kB]
** Note: generating intermediate code [233ms +31840kB]
** Note: dumping coverage data [9ms +264kB]
** Note: saving library [27ms +0kB]
** Note: code generation for 5144 units [27165ms +81568kB]
** Note: linking shared library [21ms +488kB]

Toggle only:

** Note: initialising [8ms +23408kB]
** Note: loading top-level unit [10ms +7332kB]
** Note: elaborating design [450ms +35700kB]
** Note: generating intermediate code [247ms +34312kB]
** Note: dumping coverage data [15ms +264kB]
** Note: saving library [31ms +0kB]
** Note: code generation for 5144 units [25430ms +85972kB]
** Note: linking shared library [17ms +1052kB]

All coverage types:

Elaborating:
** Note: initialising [10ms +23404kB]
** Note: loading top-level unit [20ms +7332kB]
** Note: elaborating design [453ms +35720kB]
** Note: generating intermediate code [259ms +36628kB]
** Note: dumping coverage data [25ms +264kB]
** Note: saving library [23ms +0kB]
** Note: code generation for 5144 units [31899ms +89000kB]
** Note: linking shared library [18ms +240kB]

I re-ran also without coverage because last time I had docker running on side with some GHDL simulation, so its not 100% comparable. Now all measurements have the same "background" noise.

Blebowski avatar Oct 05 '22 21:10 Blebowski

Hello @nickg do you have any news with regards to coverage? Did you happend to have a time to review it ? I am wondering whether I should start working on some of the "TODOs" from the first comment in this MR, or whether I should wait and expect some major changes due to review.

PS: I don't intend to put any pressure here, so please take any time you need, I am just wondering how to proceed further on my side.

Blebowski avatar Oct 09 '22 13:10 Blebowski

Can you have a look at the open PR for your branch? https://github.com/Blebowski/nvc/pull/3

nickg avatar Oct 09 '22 14:10 nickg

Oh, thanks, I completely missed this-one. I probably should set some email notifications on my Github repos...

Blebowski avatar Oct 09 '22 14:10 Blebowski

Oh, OK, I thought it sent emails by default.

nickg avatar Oct 09 '22 14:10 nickg

Hi @nickg , thanks for review, I will look at all the stuff (including your last MR), but I am swamped by my regular work duties these days, so its going to take some time.

Blebowski avatar Oct 20 '22 21:10 Blebowski

That's fine but you might want to try rebasing at some point soon as I made changes to some of the related files (mainly src/lower.c).

nickg avatar Oct 20 '22 21:10 nickg

I have processed most of the remarks, still need to fix tests.

I completely removed implicit labeling of branches and statements from parser, and moved it all to the coverage implementation to have it better decoupled from rest of the code (as you suggested). Looking at it, now it makes more sense.

Further, I nest coverage scope on almost every tree item, it is easier to handle in the code, and it makes the statements within large processes independed of each other (since they are not linearly labelled).

Blebowski avatar Oct 30 '22 20:10 Blebowski

Removal of implicit statement labelling completely from parser, had only one issue in p_component_instantiation statement. I worked around by not inserting NULL into symbol table. Since parse error is thrown in this case, IMHO its unimportant anyway, but I am curious about your opinion.

The same applies for procedure call in test_parse, since there is no implicit labelling, assertions which relied on either _SX or old __lineXYZ label existing were failing. I removed them since the labels do not exist anymore.

Let me know whether it is OK, like this, or you would like to return the previous "__lineXYZ" labelling. I would try to avoid keeping some part of the _SX labelling in parser, when the same thing is done in cover.c.

Blebowski avatar Nov 02 '22 00:11 Blebowski

Let me know whether it is OK, like this, or you would like to return the previous "__lineXYZ" labelling. I would try to avoid keeping some part of the _SX labelling in parser, when the same thing is done in cover.c.

I think this is OK but could you make a separate PR for it?

nickg avatar Nov 02 '22 07:11 nickg

OK, I will do that. Btw. WBU the failing Ubuntu test? Should I implement JIT-IIR support for cover VCODE ops? Also, what is the architecture behind the JIT? I understand that you want to speed-up the code generation, but I am somehow lost on the flow of data / compilation (I have practically zero experience with JIT compilers). Is the main idea to replace the LLVM API with directly emmited LLVM IR (or is it custom IR) to make it faster? Any "lecture for dummies" is appreciated :)

Blebowski avatar Nov 02 '22 09:11 Blebowski

WBU the failing Ubuntu test? Should I implement JIT-IIR support for cover VCODE ops?

Don't worry about that for now, I'll fix it up later. If you restore the cover flag for cover1 in testlist.txt it should get skipped in that mode.

Also, what is the architecture behind the JIT? I understand that you want to speed-up the code generation, but I am somehow lost on the flow of data / compilation (I have practically zero experience with JIT compilers). Is the main idea to replace the LLVM API with directly emmited LLVM IR (or is it custom IR) to make it faster? Any "lecture for dummies" is appreciated :)

Previously the flow looked like:

VHDL --> vcode --> LLVM ahead-of-time for runtime
              \--> Evaluator for constant folding / elaboration

And in the future it's going to be

VHDL ----\                      /---> LLVM ahead-of-time for IEEE, etc. libraries
Verilog ---> vcode ---> JIT IR -----> LLVM JIT for frequently executed code
PSL -----/                      |---> Interpreter for elaboration / infrequently executed
...                             \---> (Maybe) lightweight code generator like GHDL mcode

The problems with the old approach are:

  • The evaluator and the LLVM vcode backends were totally separate so every new vcode operation had to be implemented in two places. And historically the evaluator was both slow and buggy (sometimes gave different results if something was evaluated at elaboration time vs runtime). I also want to do some refactoring of how vcode works and this doubles the number of consumers that need updating.
  • All the generated code has to be passed through LLVM which is very slow (because it's designed to generate high quality code). But a lot of VHDL code is either very simple so it doesn't benefit from the analysis LLVM does or is only executed once (e.g. initialisation functions).
  • I get the impression that most users re-elaborate the design every time they run it (e.g. nvc -e foo -r). That's different to normal software compilation where the program is compiled once and run thousands of times.

The new JIT IR is designed to be quite low level (it looks a bit like a simplified RISC assembly language) so it shouldn't have to change often and is also easy to write an efficient interpreter for. So now there's one pass that converts vcode IR into JIT IR (src/jit/jit-irgen.c) and that can either be interpreted (jit-interp.c) or compiled via LLVM (jit-llvm.c). The latter isn't used at the moment unless you pass a special configure flag, but eventually it will replace the existing vcode->LLVM translation in cgen.c.

nickg avatar Nov 05 '22 14:11 nickg

WBU the failing Ubuntu test? Should I implement JIT-IIR support for cover VCODE ops?

Don't worry about that for now, I'll fix it up later. If you restore the cover flag for cover1 in testlist.txt it should get skipped in that mode.

OK

Also, what is the architecture behind the JIT? I understand that you want to speed-up the code generation, but I am somehow lost on the flow of data / compilation (I have practically zero experience with JIT compilers). Is the main idea to replace the LLVM API with directly emmited LLVM IR (or is it custom IR) to make it faster? Any "lecture for dummies" is appreciated :)

Previously the flow looked like:

VHDL --> vcode --> LLVM ahead-of-time for runtime
              \--> Evaluator for constant folding / elaboration

And in the future it's going to be

VHDL ----\                      /---> LLVM ahead-of-time for IEEE, etc. libraries
Verilog ---> vcode ---> JIT IR -----> LLVM JIT for frequently executed code
PSL -----/                      |---> Interpreter for elaboration / infrequently executed
...                             \---> (Maybe) lightweight code generator like GHDL mcode

The problems with the old approach are:

  • The evaluator and the LLVM vcode backends were totally separate so every new vcode operation had to be implemented in two places. And historically the evaluator was both slow and buggy (sometimes gave different results if something was evaluated at elaboration time vs runtime). I also want to do some refactoring of how vcode works and this doubles the number of consumers that need updating.
  • All the generated code has to be passed through LLVM which is very slow (because it's designed to generate high quality code). But a lot of VHDL code is either very simple so it doesn't benefit from the analysis LLVM does or is only executed once (e.g. initialisation functions).

I see, that makes sense, doing "on the flight" generation on infrequently invoked parts of the code is IMHO good option. Also, it allows recompiling the code if some of it needs to change, e.g. due to increased visibility of some of its parts for debug. AFAIK, most commercial tools do it the same way. Once you launch the tool (post elab), you can "enable" dumping part of the design which on-the-fly recompile that part of the design and insert hooks into the code upon signal change.

  • I get the impression that most users re-elaborate the design every time they run it (e.g. nvc -e foo -r). That's different to normal software compilation where the program is compiled once and run thousands of times.

That's true since most of the time the code is launched, its being debugged, so you need to change it and re-analyze. For regression testing, you can avoid doing elaboration multiple times in some cases (e.g. if its possible to pass a seed to randomization initialization at run-time, and you run the same test with multiple randomization seeds). However, if you pass e.g. test name via top level generic, you need to re-elaborate the whole TB each time.

The new JIT IR is designed to be quite low level (it looks a bit like a simplified RISC assembly language) so it shouldn't have to change often and is also easy to write an efficient interpreter for. So now there's one pass that converts vcode IR into JIT IR (src/jit/jit-irgen.c) and that can either be interpreted (jit-interp.c) or compiled via LLVM (jit-llvm.c). The latter isn't used at the moment unless you pass a special configure flag, but eventually it will replace the existing vcode->LLVM translation in cgen.c.

Thanks for explanation. Btw. aren't you afraid that due to inserting one more IIR into the process, the elaboration is going to be even slower in the end?

Blebowski avatar Nov 05 '22 16:11 Blebowski

On my side the MR is ready, is there something else I should look at on it?

Edit: OK, I checked the MR into my fork, again, no notification, thats strange, I have cherry-pick, tried and pushed the changes into this MR.

Blebowski avatar Nov 06 '22 11:11 Blebowski

Sorry could you change one more thing please: the constant VCODE_VERSION in vcode.c needs to be increment because some of the instructions changed in a backwards-incompatible way.

nickg avatar Nov 06 '22 16:11 nickg

Sorry could you change one more thing please: the constant VCODE_VERSION in vcode.c needs to be increment because some of the instructions changed in a backwards-incompatible way.

Hi, no problem, will do that. I am currently trying to debug the report generation on "real" project, and I can see its not optimal the way statements / branches are reported for conditional / selected signal assign. Plus, I ran accross few bugs remaining there, so I am fixing those :)

Blebowski avatar Nov 06 '22 16:11 Blebowski

@nickg , now the MR is ready from my side. Testing on larger design showed my foolishness in the code which reallocs the chains. I fixed it in little hackish way (instead of using double pointer), but for now it works. Rest is up to some further refactoring, but I would preffer getting it to master. Maybe some other users would start using it, and reporting bugs.

Blebowski avatar Nov 08 '22 08:11 Blebowski