Improve O(n^2) algorithms
@wsnyder would you mind if I recode some of this in C++ (and which edition?), and maybe use mmap for reading the files (requires POSIX host, and 64-bit platform for large files)? I'm trying to diff 20GB VCD files and and just the signal matching takes forever due to the O(N^2) algorithm.
Go crazy ;)
I'd suggest:
- Commit to add at least one test. (Suggest "make test" with no python driver.)
- Add github action test & format.
- Uplift to c++ with strong warnings enabled, but no functional change. As we want to use this with Verilator tests, C++14 for now would be my suggestion (or stay at C++11).
- Fix algorithm
You can merge as you feel fit, I probably won't review unless something seems interesting or you ask.
There is a bit of a fundamental question about what we want this tool to do. One of the reasons it's super slow, is because if there is an entry in one file, but not in the other (even if the values haven't changed), it will look ahead in the other file:
File A File B
... omitted, everything matches up to here.
#time1 #time1
b0000 sigA b0000 sigB // Matches
#time2 #time2
b0001 sigA // No corresponding entry in file B
#time3 #time3
... lots of stuff here ...
#time1000000 #time1000000
b0001 sigA // It will search for this due to no entry for sigA at #time2
after searching from #time2 until #time1000000 for the entry for sigA, it will then rewind to #time2 and continue with the next entry.
From usage perspective this doesn't make much sense and the reported information that the next change entry for sigA in file B is zillions of cycles later is not particularly useful. We already know there is a mismatch when we see the #time3 entry in file B.
I think all we care about is whether the values that would be displayed in a viewer are the same or not, so as an alternative, I propose that we keep the current state of all signals in both files, and then:
for each unique time point in both files:
- Apply all changes to current state up to that time point
- Check state of signals that were updated match at that time point
This will basically report as soon as the states diverge, and requires only a single pass over the files.
BTW I'm looking at this because the last DFG patch in Verilator (verilator/verilator#6096) changed the cycle count of one of the cases. Not sure if it's an optimzation bug or a race condition in the design.
You suggest the way I would have done it. ;)
In terms of how it's used it ideally would report differences based on these cases:
- Signal added, or removed
- Signal renamed (for simplicity for now can just list these as signal add/removes)
- Scope added or removed (for simplicity for now can just list these as signal add/removes)
- Scope changed but signal base names same (for simplicity for now can just list these as signal add/removes)
- New or removed signal transition at a time
Sorry, this last one is a bit unclear. Do you mean:
- Report header changes (signal/scope add/remove/type change/etc) - your first 4 bullet points
- Report value divergence of signals present in both files
Not sure what you mean by the last bullet
Correct.
Ok, I will spend a some time on this as stuff like verilator/verilator#6100 or similar are likely to come up more often now that we run complex designs via RTLMeter regularly and vcddiff is super helpful for that.
std::string_view would be very handy, but is C++17.
https://veripool.org/guide/latest/deprecations.html mentiones the deadline for C++20 is May 2025, which has just passed. Would this be a good time to pull the trigger on C++20 in Verilator, and then we can make this match? If so I will do the Verilator work for the upgrade first.
For c++20 we need a configure error that can be overridden for at least a full release. Want to make that, or I can.
That sounds like it might be better you do it then, as it sounds like you know what is needed (or I'm happy to, if you explain in detail)