vcddiff icon indicating copy to clipboard operation
vcddiff copied to clipboard

Improve O(n^2) algorithms

Open gezalore opened this issue 7 months ago • 9 comments

@wsnyder would you mind if I recode some of this in C++ (and which edition?), and maybe use mmap for reading the files (requires POSIX host, and 64-bit platform for large files)? I'm trying to diff 20GB VCD files and and just the signal matching takes forever due to the O(N^2) algorithm.

gezalore avatar Jun 17 '25 11:06 gezalore

Go crazy ;)

I'd suggest:

  1. Commit to add at least one test. (Suggest "make test" with no python driver.)
  2. Add github action test & format.
  3. Uplift to c++ with strong warnings enabled, but no functional change. As we want to use this with Verilator tests, C++14 for now would be my suggestion (or stay at C++11).
  4. Fix algorithm

You can merge as you feel fit, I probably won't review unless something seems interesting or you ask.

wsnyder avatar Jun 17 '25 12:06 wsnyder

There is a bit of a fundamental question about what we want this tool to do. One of the reasons it's super slow, is because if there is an entry in one file, but not in the other (even if the values haven't changed), it will look ahead in the other file:

File A           File B

... omitted, everything matches up to here.

#time1            #time1
b0000 sigA        b0000 sigB  // Matches
#time2            #time2
b0001 sigA                    // No corresponding entry in file B
#time3            #time3
... lots of stuff here ...
#time1000000      #time1000000
                  b0001 sigA    // It will search for this due to no entry for sigA at #time2

after searching from #time2 until #time1000000 for the entry for sigA, it will then rewind to #time2 and continue with the next entry.

From usage perspective this doesn't make much sense and the reported information that the next change entry for sigA in file B is zillions of cycles later is not particularly useful. We already know there is a mismatch when we see the #time3 entry in file B.

I think all we care about is whether the values that would be displayed in a viewer are the same or not, so as an alternative, I propose that we keep the current state of all signals in both files, and then:

for each unique time point in both files:

  • Apply all changes to current state up to that time point
  • Check state of signals that were updated match at that time point

This will basically report as soon as the states diverge, and requires only a single pass over the files.

gezalore avatar Jun 17 '25 13:06 gezalore

BTW I'm looking at this because the last DFG patch in Verilator (verilator/verilator#6096) changed the cycle count of one of the cases. Not sure if it's an optimzation bug or a race condition in the design.

gezalore avatar Jun 17 '25 13:06 gezalore

You suggest the way I would have done it. ;)

In terms of how it's used it ideally would report differences based on these cases:

  • Signal added, or removed
  • Signal renamed (for simplicity for now can just list these as signal add/removes)
  • Scope added or removed (for simplicity for now can just list these as signal add/removes)
  • Scope changed but signal base names same (for simplicity for now can just list these as signal add/removes)
  • New or removed signal transition at a time

wsnyder avatar Jun 17 '25 13:06 wsnyder

Sorry, this last one is a bit unclear. Do you mean:

  1. Report header changes (signal/scope add/remove/type change/etc) - your first 4 bullet points
  2. Report value divergence of signals present in both files

Not sure what you mean by the last bullet

gezalore avatar Jun 17 '25 13:06 gezalore

Correct.

wsnyder avatar Jun 17 '25 14:06 wsnyder

Ok, I will spend a some time on this as stuff like verilator/verilator#6100 or similar are likely to come up more often now that we run complex designs via RTLMeter regularly and vcddiff is super helpful for that.

std::string_view would be very handy, but is C++17.

https://veripool.org/guide/latest/deprecations.html mentiones the deadline for C++20 is May 2025, which has just passed. Would this be a good time to pull the trigger on C++20 in Verilator, and then we can make this match? If so I will do the Verilator work for the upgrade first.

gezalore avatar Jun 17 '25 15:06 gezalore

For c++20 we need a configure error that can be overridden for at least a full release. Want to make that, or I can.

wsnyder avatar Jun 17 '25 16:06 wsnyder

That sounds like it might be better you do it then, as it sounds like you know what is needed (or I'm happy to, if you explain in detail)

gezalore avatar Jun 17 '25 16:06 gezalore