Change nvbench_compare statuses
Currently, when I compare a baseline with new result JSON using nvbench_compare.py base.json new.json, I get output like the following:
| ValueType | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|---|---|---|---|---|---|---|---|
| I8 | 12.699 ms | 8.71% | 12.481 ms | 10.17% | -217.438 us | -1.71% | PASS |
| I16 | 12.514 ms | 8.06% | 12.513 ms | 5.98% | -0.956 us | -0.01% | PASS |
| I32 | 12.899 ms | 10.22% | 12.091 ms | 2.15% | -808.662 us | -6.27% | FAIL |
| I64 | 12.073 ms | 1.19% | 12.051 ms | 0.81% | -21.432 us | -0.18% | PASS |
| __int128 | 12.188 ms | 4.55% | 12.833 ms | 2.56% | 644.844 us | 5.29% | FAIL |
With PASS colored green, and `FAIL colored red. It would be more useful if the the Status column could not only highlight significant changes, but show me whether the timings improved or regressed. Therefore, I propose an output like this (only status column changed):
| ValueType | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|---|---|---|---|---|---|---|---|
| I8 | 12.699 ms | 8.71% | 12.481 ms | 10.17% | -217.438 us | -1.71% | PASS |
| I16 | 12.514 ms | 8.06% | 12.513 ms | 5.98% | -0.956 us | -0.01% | PASS |
| I32 | 12.899 ms | 10.22% | 12.091 ms | 2.15% | -808.662 us | -6.27% | FAST |
| I64 | 12.073 ms | 1.19% | 12.051 ms | 0.81% | -21.432 us | -0.18% | PASS |
| __int128 | 12.188 ms | 4.55% | 12.833 ms | 2.56% | 644.844 us | 5.29% | SLOW |
With PASS uncolored, FAST colored green. and SLOW colored red.
You can of course pick any better word for status, or different colors. But there is value in differentiating whether the time became faster or slower.
Another question is whether these new statuses should appear by default or you would want to have a CLI switch between the PASS/FAIL and the PASS/SLOW/FAST.
I support this. The comparison script is still very much a work in progress and has much room for improvement. Clarifying whether the test was faster or slower would be a nice QoL change.
One suggestion: I don't think PASS fits with FAST and SLOW anymore. Maybe SAME for when the new samples are within the tolerance?
A related improvement would be to make the coloring more sensible, adding --color and --no-color flags to explicitly set the color state, and detect whether the output device supports color by default. IIRC we always use color, and this gets annoying when piping to a file, etc.