nvbench icon indicating copy to clipboard operation
nvbench copied to clipboard

Also compare batch measurements in nvbench_compare.py

Open bernhardmgruber opened this issue 6 months ago • 1 comments

Fixes: #247

Cold and batch measurements can sometimes differ substantially, so we want to show both. An example is kernels using PDL (Programmatic Dependent Launch).

Here is a comparison of DeviceTransform with and without PDL (see also https://github.com/NVIDIA/cccl/pull/5249):

# mul

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |    B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|-----------|---------|----------|--------------|--------------|-----------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.729 us |      11.77% |   5.734 us |      11.65% |  0.005 us |   0.08% |   SAME   |     4.094 us |     1.622 us | -2.472 us |   -60.39% |    FAST    |
|   I8    |      I32      |      2^20      |   6.016 us |       5.61% |   6.130 us |       7.31% |  0.114 us |   1.90% |   SAME   |     4.101 us |     2.831 us | -1.270 us |   -30.97% |    FAST    |
|   I8    |      I32      |      2^24      |  13.739 us |       5.56% |  13.620 us |       5.84% | -0.119 us |  -0.87% |   SAME   |    10.271 us |     8.388 us | -1.883 us |   -18.33% |    FAST    |
|   I8    |      I32      |      2^28      | 114.526 us |       0.25% | 114.550 us |       0.28% |  0.024 us |   0.02% |   SAME   |   112.599 us |   110.039 us | -2.560 us |    -2.27% |    FAST    |
|   I8    |      I64      |      2^16      |   5.494 us |      14.80% |   5.512 us |      14.22% |  0.018 us |   0.33% |   SAME   |     4.094 us |     1.594 us | -2.500 us |   -61.06% |    FAST    |
|   I8    |      I64      |      2^20      |   6.059 us |       5.71% |   6.383 us |       9.59% |  0.324 us |   5.35% |   SAME   |     4.101 us |     2.826 us | -1.274 us |   -31.07% |    FAST    |
|   I8    |      I64      |      2^24      |  14.075 us |       2.96% |  14.041 us |       3.52% | -0.035 us |  -0.25% |   SAME   |    10.290 us |     8.436 us | -1.854 us |   -18.02% |    FAST    |
|   I8    |      I64      |      2^28      | 115.311 us |       0.80% | 115.433 us |       0.82% |  0.122 us |   0.11% |   SAME   |   112.952 us |   110.872 us | -2.079 us |    -1.84% |    FAST    |
|   I16   |      I32      |      2^16      |   5.346 us |      16.26% |   5.377 us |      15.65% |  0.031 us |   0.58% |   SAME   |     4.094 us |     1.599 us | -2.496 us |   -60.96% |    FAST    |
|   I16   |      I32      |      2^20      |   6.099 us |       6.33% |   6.380 us |       9.76% |  0.281 us |   4.61% |   SAME   |     4.099 us |     2.815 us | -1.285 us |   -31.34% |    FAST    |
|   I16   |      I32      |      2^24      |  16.382 us |       3.32% |  16.376 us |       3.23% | -0.006 us |  -0.04% |   SAME   |    12.288 us |    10.074 us | -2.214 us |   -18.02% |    FAST    |
|   I16   |      I32      |      2^28      | 161.785 us |       0.28% | 161.935 us |       0.44% |  0.150 us |   0.09% |   SAME   |   159.618 us |   156.623 us | -2.996 us |    -1.88% |    FAST    |
|   I16   |      I64      |      2^16      |   5.672 us |      12.72% |   5.756 us |      11.04% |  0.084 us |   1.48% |   SAME   |     4.094 us |     1.577 us | -2.518 us |   -61.49% |    FAST    |
|   I16   |      I64      |      2^20      |   6.129 us |       6.10% |   6.391 us |      10.08% |  0.262 us |   4.28% |   SAME   |     4.102 us |     2.816 us | -1.286 us |   -31.36% |    FAST    |
|   I16   |      I64      |      2^24      |  16.434 us |       3.20% |  16.467 us |       3.44% |  0.034 us |   0.21% |   SAME   |    12.289 us |    10.120 us | -2.169 us |   -17.65% |    FAST    |
|   I16   |      I64      |      2^28      | 163.663 us |       0.22% | 163.721 us |       0.16% |  0.058 us |   0.04% |   SAME   |   160.345 us |   158.210 us | -2.135 us |    -1.33% |    FAST    |
|   F32   |      I32      |      2^16      |   5.935 us |       6.94% |   5.907 us |       6.24% | -0.028 us |  -0.47% |   SAME   |     4.094 us |     1.595 us | -2.499 us |   -61.04% |    FAST    |
|   F32   |      I32      |      2^20      |   7.543 us |      10.91% |   7.497 us |      10.90% | -0.046 us |  -0.61% |   SAME   |     4.099 us |     2.856 us | -1.244 us |   -30.34% |    FAST    |
|   F32   |      I32      |      2^24      |  25.470 us |       3.65% |  25.395 us |       3.59% | -0.075 us |  -0.29% |   SAME   |    20.843 us |    19.593 us | -1.251 us |    -6.00% |    FAST    |
|   F32   |      I32      |      2^28      | 313.226 us |       0.32% | 313.291 us |       0.30% |  0.065 us |   0.02% |   SAME   |   311.189 us |   308.332 us | -2.857 us |    -0.92% |    SAME    |
|   F32   |      I64      |      2^16      |   5.545 us |      14.41% |   5.487 us |      14.25% | -0.058 us |  -1.05% |   SAME   |     4.094 us |     1.600 us | -2.494 us |   -60.92% |    FAST    |
|   F32   |      I64      |      2^20      |   7.494 us |      11.26% |   7.432 us |      11.17% | -0.062 us |  -0.83% |   SAME   |     4.099 us |     2.859 us | -1.241 us |   -30.26% |    FAST    |
|   F32   |      I64      |      2^24      |  25.602 us |       3.56% |  25.608 us |       3.53% |  0.006 us |   0.02% |   SAME   |    20.726 us |    19.591 us | -1.136 us |    -5.48% |    FAST    |
|   F32   |      I64      |      2^28      | 313.271 us |       0.33% | 313.254 us |       0.31% | -0.017 us |  -0.01% |   SAME   |   311.100 us |   308.282 us | -2.818 us |    -0.91% |    SAME    |
|   F64   |      I32      |      2^16      |   5.706 us |      11.69% |   5.719 us |      11.38% |  0.014 us |   0.24% |   SAME   |     4.094 us |     1.596 us | -2.498 us |   -61.01% |    FAST    |
|   F64   |      I32      |      2^20      |   8.084 us |       4.22% |   8.043 us |       4.93% | -0.041 us |  -0.51% |   SAME   |     4.108 us |     2.882 us | -1.226 us |   -29.85% |    FAST    |
|   F64   |      I32      |      2^24      |  45.629 us |       2.07% |  45.498 us |       1.93% | -0.131 us |  -0.29% |   SAME   |    43.046 us |    40.308 us | -2.738 us |    -6.36% |    FAST    |
|   F64   |      I32      |      2^28      | 620.092 us |       0.23% | 620.155 us |       0.19% |  0.063 us |   0.01% |   SAME   |   617.714 us |   614.871 us | -2.843 us |    -0.46% |    SAME    |
|   F64   |      I64      |      2^16      |   5.698 us |      11.99% |   5.698 us |      11.30% | -0.001 us |  -0.01% |   SAME   |     4.094 us |     1.616 us | -2.478 us |   -60.53% |    FAST    |
|   F64   |      I64      |      2^20      |   8.098 us |       4.25% |   8.032 us |       4.07% | -0.067 us |  -0.82% |   SAME   |     4.106 us |     2.896 us | -1.210 us |   -29.47% |    FAST    |
|   F64   |      I64      |      2^24      |  45.517 us |       2.01% |  45.637 us |       2.05% |  0.119 us |   0.26% |   SAME   |    43.031 us |    40.252 us | -2.779 us |    -6.46% |    FAST    |
|   F64   |      I64      |      2^28      | 620.032 us |       0.22% | 620.114 us |       0.22% |  0.082 us |   0.01% |   SAME   |   617.629 us |   614.842 us | -2.786 us |    -0.45% |    SAME    |
|  I128   |      I32      |      2^16      |   5.959 us |       4.88% |   5.917 us |       5.20% | -0.042 us |  -0.71% |   SAME   |     4.094 us |     1.596 us | -2.499 us |   -61.03% |    FAST    |
|  I128   |      I32      |      2^20      |  10.308 us |       4.86% |  10.378 us |       5.50% |  0.070 us |   0.67% |   SAME   |     6.141 us |     5.100 us | -1.042 us |   -16.97% |    FAST    |
|  I128   |      I32      |      2^24      |  83.570 us |       0.98% |  83.639 us |       0.89% |  0.068 us |   0.08% |   SAME   |    81.530 us |    78.615 us | -2.916 us |    -3.58% |    FAST    |
|  I128   |      I32      |      2^28      |   1.235 ms |       0.15% |   1.235 ms |       0.15% | -0.010 us |  -0.00% |   SAME   |     1.233 ms |     1.231 ms | -2.248 us |    -0.18% |    SAME    |
|  I128   |      I64      |      2^16      |   5.979 us |       4.55% |   5.995 us |       4.38% |  0.016 us |   0.27% |   SAME   |     4.094 us |     1.601 us | -2.493 us |   -60.90% |    FAST    |
|  I128   |      I64      |      2^20      |  10.320 us |       4.89% |  10.433 us |       6.27% |  0.113 us |   1.10% |   SAME   |     6.141 us |     5.091 us | -1.050 us |   -17.10% |    FAST    |
|  I128   |      I64      |      2^24      |  83.769 us |       0.74% |  83.803 us |       0.77% |  0.034 us |   0.04% |   SAME   |    81.544 us |    78.602 us | -2.942 us |    -3.61% |    FAST    |
|  I128   |      I64      |      2^28      |   1.234 ms |       0.15% |   1.234 ms |       0.14% | -0.059 us |  -0.00% |   SAME   |     1.233 ms |     1.231 ms | -1.907 us |    -0.15% |    SAME    |

# add

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |     B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|------------|---------|----------|--------------|--------------|------------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.829 us |       9.25% |   5.838 us |       8.67% |   0.009 us |   0.16% |   SAME   |     4.094 us |     1.594 us |  -2.500 us |   -61.06% |    FAST    |
|   I8    |      I32      |      2^20      |   6.113 us |       6.09% |   6.388 us |      10.02% |   0.275 us |   4.49% |   SAME   |     4.098 us |     3.051 us |  -1.047 us |   -25.55% |    FAST    |
|   I8    |      I32      |      2^24      |  15.292 us |       6.13% |  15.207 us |       6.09% |  -0.085 us |  -0.56% |   SAME   |    11.264 us |     9.365 us |  -1.899 us |   -16.86% |    FAST    |
|   I8    |      I32      |      2^28      | 141.213 us |       0.11% | 141.350 us |       0.41% |   0.137 us |   0.10% |   SAME   |   137.296 us |   135.994 us |  -1.302 us |    -0.95% |    SAME    |
|   I8    |      I64      |      2^16      |   5.919 us |       7.47% |   5.926 us |       6.03% |   0.007 us |   0.12% |   SAME   |     4.094 us |     1.604 us |  -2.490 us |   -60.82% |    FAST    |
|   I8    |      I64      |      2^20      |   6.111 us |       5.90% |   6.422 us |      10.09% |   0.310 us |   5.07% |   SAME   |     4.098 us |     3.097 us |  -1.001 us |   -24.43% |    FAST    |
|   I8    |      I64      |      2^24      |  15.775 us |       5.14% |  15.685 us |       5.19% |  -0.091 us |  -0.57% |   SAME   |    11.875 us |     9.387 us |  -2.489 us |   -20.96% |    FAST    |
|   I8    |      I64      |      2^28      | 143.584 us |       0.49% | 143.338 us |       0.34% |  -0.246 us |  -0.17% |   SAME   |   141.093 us |   137.760 us |  -3.333 us |    -2.36% |    FAST    |
|   I16   |      I32      |      2^16      |   5.936 us |       6.48% |   5.841 us |       9.35% |  -0.095 us |  -1.60% |   SAME   |     4.094 us |     1.627 us |  -2.467 us |   -60.26% |    FAST    |
|   I16   |      I32      |      2^20      |   6.649 us |      11.53% |   6.616 us |      11.37% |  -0.033 us |  -0.49% |   SAME   |     4.098 us |     3.023 us |  -1.075 us |   -26.23% |    FAST    |
|   I16   |      I32      |      2^24      |  22.386 us |       1.29% |  22.510 us |       2.15% |   0.123 us |   0.55% |   SAME   |    16.525 us |    18.175 us |   1.650 us |     9.98% |    SLOW    |
|   I16   |      I32      |      2^28      | 237.871 us |       0.33% | 272.302 us |       0.11% |  34.431 us |  14.47% |   SLOW   |   233.873 us |   268.593 us |  34.719 us |    14.85% |    SLOW    |
|   I16   |      I64      |      2^16      |   5.623 us |      13.25% |   5.612 us |      13.14% |  -0.011 us |  -0.20% |   SAME   |     4.094 us |     1.604 us |  -2.490 us |   -60.82% |    FAST    |
|   I16   |      I64      |      2^20      |   6.673 us |      11.74% |   6.706 us |      11.73% |   0.033 us |   0.50% |   SAME   |     4.098 us |     3.101 us |  -0.997 us |   -24.32% |    FAST    |
|   I16   |      I64      |      2^24      |  22.365 us |       1.38% |  22.528 us |       1.83% |   0.163 us |   0.73% |   SAME   |    16.525 us |    18.208 us |   1.683 us |    10.19% |    SLOW    |
|   I16   |      I64      |      2^28      | 239.449 us |       0.17% | 272.324 us |       0.19% |  32.875 us |  13.73% |   SLOW   |   235.292 us |   268.593 us |  33.301 us |    14.15% |    SLOW    |
|   F32   |      I32      |      2^16      |   5.787 us |      10.26% |   5.779 us |      10.38% |  -0.008 us |  -0.13% |   SAME   |     4.094 us |     1.575 us |  -2.519 us |   -61.52% |    FAST    |
|   F32   |      I32      |      2^20      |   8.008 us |       3.51% |   7.970 us |       3.98% |  -0.038 us |  -0.47% |   SAME   |     4.098 us |     2.875 us |  -1.223 us |   -29.84% |    FAST    |
|   F32   |      I32      |      2^24      |  35.876 us |       2.71% |  38.696 us |       0.88% |   2.820 us |   7.86% |   SLOW   |    32.666 us |    34.847 us |   2.182 us |     6.68% |    SLOW    |
|   F32   |      I32      |      2^28      | 454.211 us |       0.26% | 539.121 us |       0.17% |  84.909 us |  18.69% |   SLOW   |   452.508 us |   535.671 us |  83.163 us |    18.38% |    SLOW    |
|   F32   |      I64      |      2^16      |   5.958 us |       4.79% |   5.942 us |       5.19% |  -0.017 us |  -0.28% |   SAME   |     4.094 us |     1.590 us |  -2.504 us |   -61.17% |    FAST    |
|   F32   |      I64      |      2^20      |   8.016 us |       3.59% |   8.033 us |       3.33% |   0.018 us |   0.22% |   SAME   |     4.098 us |     2.880 us |  -1.218 us |   -29.73% |    FAST    |
|   F32   |      I64      |      2^24      |  35.917 us |       2.54% |  38.719 us |       0.85% |   2.802 us |   7.80% |   SLOW   |    32.694 us |    34.865 us |   2.171 us |     6.64% |    SLOW    |
|   F32   |      I64      |      2^28      | 453.938 us |       0.27% | 539.019 us |       0.16% |  85.081 us |  18.74% |   SLOW   |   452.062 us |   535.638 us |  83.577 us |    18.49% |    SLOW    |
|   F64   |      I32      |      2^16      |   6.006 us |       4.47% |   5.973 us |       4.50% |  -0.033 us |  -0.55% |   SAME   |     4.094 us |     1.594 us |  -2.501 us |   -61.08% |    FAST    |
|   F64   |      I32      |      2^20      |  10.086 us |       2.80% |  10.059 us |       3.17% |  -0.027 us |  -0.26% |   SAME   |     6.141 us |     5.197 us |  -0.944 us |   -15.37% |    FAST    |
|   F64   |      I32      |      2^24      |  65.164 us |       1.05% |  71.574 us |       0.51% |   6.411 us |   9.84% |   SLOW   |    59.868 us |    68.208 us |   8.340 us |    13.93% |    SLOW    |
|   F64   |      I32      |      2^28      | 909.760 us |       0.20% |   1.073 ms |       0.05% | 163.440 us |  17.97% |   SLOW   |   905.075 us |     1.070 ms | 164.766 us |    18.20% |    SLOW    |
|   F64   |      I64      |      2^16      |   5.962 us |       4.90% |   5.957 us |       5.03% |  -0.005 us |  -0.09% |   SAME   |     4.094 us |     1.583 us |  -2.511 us |   -61.34% |    FAST    |
|   F64   |      I64      |      2^20      |  10.092 us |       2.78% |  10.089 us |       2.82% |  -0.003 us |  -0.03% |   SAME   |     6.141 us |     5.192 us |  -0.950 us |   -15.46% |    FAST    |
|   F64   |      I64      |      2^24      |  65.317 us |       0.74% |  71.592 us |       0.53% |   6.275 us |   9.61% |   SLOW   |    59.828 us |    68.206 us |   8.378 us |    14.00% |    SLOW    |
|   F64   |      I64      |      2^28      | 909.178 us |       0.19% |   1.073 ms |       0.06% | 164.056 us |  18.04% |   SLOW   |   904.768 us |     1.070 ms | 165.069 us |    18.24% |    SLOW    |
|  I128   |      I32      |      2^16      |   5.997 us |       4.65% |   5.947 us |       5.01% |  -0.050 us |  -0.83% |   SAME   |     4.094 us |     1.593 us |  -2.501 us |   -61.10% |    FAST    |
|  I128   |      I32      |      2^20      |  14.195 us |       1.87% |  14.194 us |       1.96% |  -0.001 us |  -0.00% |   SAME   |     8.188 us |     9.372 us |   1.184 us |    14.46% |    SLOW    |
|  I128   |      I32      |      2^24      | 120.742 us |       0.76% | 138.904 us |       0.47% |  18.162 us |  15.04% |   SLOW   |   116.723 us |   134.950 us |  18.227 us |    15.62% |    SLOW    |
|  I128   |      I32      |      2^28      |   1.811 ms |       0.15% |   2.142 ms |       0.02% | 330.608 us |  18.25% |   SLOW   |     1.807 ms |     2.140 ms | 333.116 us |    18.44% |    SLOW    |
|  I128   |      I64      |      2^16      |   5.979 us |       5.28% |   5.991 us |       4.69% |   0.012 us |   0.21% |   SAME   |     4.094 us |     1.594 us |  -2.500 us |   -61.07% |    FAST    |
|  I128   |      I64      |      2^20      |  14.176 us |       1.96% |  14.138 us |       2.08% |  -0.038 us |  -0.27% |   SAME   |     8.188 us |     9.378 us |   1.190 us |    14.53% |    SLOW    |
|  I128   |      I64      |      2^24      | 120.699 us |       0.75% | 138.979 us |       0.35% |  18.280 us |  15.15% |   SLOW   |   116.648 us |   134.950 us |  18.303 us |    15.69% |    SLOW    |
|  I128   |      I64      |      2^28      |   1.811 ms |       0.15% |   2.142 ms |       0.02% | 330.955 us |  18.28% |   SLOW   |     1.805 ms |     2.140 ms | 334.179 us |    18.51% |    SLOW    |

# triad

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |     B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|------------|---------|----------|--------------|--------------|------------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.814 us |       9.73% |   5.905 us |       6.76% |   0.091 us |   1.57% |   SAME   |     4.094 us |     1.602 us |  -2.493 us |   -60.88% |    FAST    |
|   I8    |      I32      |      2^20      |   6.128 us |       6.23% |   6.365 us |       9.87% |   0.237 us |   3.87% |   SAME   |     4.098 us |     3.039 us |  -1.059 us |   -25.83% |    FAST    |
|   I8    |      I32      |      2^24      |  16.207 us |       1.78% |  16.208 us |       1.79% |   0.001 us |   0.01% |   SAME   |    12.294 us |    10.362 us |  -1.932 us |   -15.72% |    FAST    |
|   I8    |      I32      |      2^28      | 149.994 us |       0.59% | 150.658 us |       0.63% |   0.664 us |   0.44% |   SAME   |   147.459 us |   145.001 us |  -2.458 us |    -1.67% |    FAST    |
|   I8    |      I64      |      2^16      |   6.006 us |       4.56% |   5.974 us |       4.71% |  -0.032 us |  -0.53% |   SAME   |     4.094 us |     1.613 us |  -2.481 us |   -60.60% |    FAST    |
|   I8    |      I64      |      2^20      |   6.083 us |       5.23% |   6.343 us |       9.49% |   0.260 us |   4.27% |   SAME   |     4.098 us |     3.083 us |  -1.015 us |   -24.76% |    FAST    |
|   I8    |      I64      |      2^24      |  16.208 us |       1.73% |  16.172 us |       1.79% |  -0.037 us |  -0.23% |   SAME   |    12.294 us |    10.623 us |  -1.671 us |   -13.59% |    FAST    |
|   I8    |      I64      |      2^28      | 153.088 us |       0.50% | 153.434 us |       0.21% |   0.346 us |   0.23% |   SLOW   |   149.586 us |   147.658 us |  -1.928 us |    -1.29% |    FAST    |
|   I16   |      I32      |      2^16      |   5.718 us |      11.90% |   5.721 us |      11.60% |   0.003 us |   0.05% |   SAME   |     4.094 us |     1.605 us |  -2.489 us |   -60.81% |    FAST    |
|   I16   |      I32      |      2^20      |   6.924 us |      13.05% |   7.008 us |      12.94% |   0.084 us |   1.21% |   SAME   |     4.098 us |     2.979 us |  -1.119 us |   -27.30% |    FAST    |
|   I16   |      I32      |      2^24      |  22.350 us |       1.38% |  22.718 us |       2.59% |   0.369 us |   1.65% |   SLOW   |    16.474 us |    18.079 us |   1.605 us |     9.74% |    SLOW    |
|   I16   |      I32      |      2^28      | 239.548 us |       0.12% | 272.308 us |       0.13% |  32.761 us |  13.68% |   SLOW   |   235.182 us |   268.432 us |  33.249 us |    14.14% |    SLOW    |
|   I16   |      I64      |      2^16      |   5.888 us |       8.48% |   5.782 us |       9.90% |  -0.106 us |  -1.80% |   SAME   |     4.094 us |     1.592 us |  -2.502 us |   -61.12% |    FAST    |
|   I16   |      I64      |      2^20      |   7.166 us |      12.98% |   7.195 us |      12.81% |   0.030 us |   0.41% |   SAME   |     4.098 us |     3.083 us |  -1.015 us |   -24.78% |    FAST    |
|   I16   |      I64      |      2^24      |  22.381 us |       1.20% |  22.704 us |       2.70% |   0.322 us |   1.44% |   SLOW   |    17.928 us |    18.183 us |   0.255 us |     1.42% |    SLOW    |
|   I16   |      I64      |      2^28      | 241.571 us |       0.12% | 272.374 us |       0.17% |  30.803 us |  12.75% |   SLOW   |   237.152 us |   268.438 us |  31.286 us |    13.19% |    SLOW    |
|   F32   |      I32      |      2^16      |   5.989 us |       4.50% |   5.950 us |       4.88% |  -0.039 us |  -0.66% |   SAME   |     4.094 us |     1.594 us |  -2.500 us |   -61.06% |    FAST    |
|   F32   |      I32      |      2^20      |   8.021 us |       3.47% |   7.989 us |       3.89% |  -0.032 us |  -0.39% |   SAME   |     4.098 us |     2.865 us |  -1.233 us |   -30.09% |    FAST    |
|   F32   |      I32      |      2^24      |  36.517 us |       1.70% |  38.782 us |       0.80% |   2.266 us |   6.20% |   SLOW   |    32.731 us |    34.859 us |   2.128 us |     6.50% |    SLOW    |
|   F32   |      I32      |      2^28      | 453.094 us |       0.24% | 539.057 us |       0.16% |  85.962 us |  18.97% |   SLOW   |   452.672 us |   535.696 us |  83.024 us |    18.34% |    SLOW    |
|   F32   |      I64      |      2^16      |   5.973 us |       4.63% |   5.960 us |       5.83% |  -0.014 us |  -0.23% |   SAME   |     4.094 us |     1.606 us |  -2.488 us |   -60.77% |    FAST    |
|   F32   |      I64      |      2^20      |   8.038 us |       3.43% |   8.009 us |       3.70% |  -0.028 us |  -0.35% |   SAME   |     4.098 us |     2.870 us |  -1.228 us |   -29.96% |    FAST    |
|   F32   |      I64      |      2^24      |  36.686 us |       0.97% |  38.769 us |       0.86% |   2.083 us |   5.68% |   SLOW   |    32.731 us |    34.859 us |   2.127 us |     6.50% |    SLOW    |
|   F32   |      I64      |      2^28      | 452.862 us |       0.19% | 539.153 us |       0.17% |  86.292 us |  19.05% |   SLOW   |   452.065 us |   535.694 us |  83.629 us |    18.50% |    SLOW    |
|   F64   |      I32      |      2^16      |   5.980 us |       4.76% |   5.963 us |       4.62% |  -0.016 us |  -0.27% |   SAME   |     4.094 us |     1.609 us |  -2.486 us |   -60.71% |    FAST    |
|   F64   |      I32      |      2^20      |  10.062 us |       2.89% |  10.054 us |       2.78% |  -0.009 us |  -0.09% |   SAME   |     6.141 us |     5.194 us |  -0.947 us |   -15.43% |    FAST    |
|   F64   |      I32      |      2^24      |  64.131 us |       1.48% |  71.512 us |       0.48% |   7.381 us |  11.51% |   SLOW   |    59.679 us |    68.255 us |   8.576 us |    14.37% |    SLOW    |
|   F64   |      I32      |      2^28      | 907.488 us |       0.24% |   1.073 ms |       0.05% | 165.704 us |  18.26% |   SLOW   |   904.912 us |     1.070 ms | 164.891 us |    18.22% |    SLOW    |
|   F64   |      I64      |      2^16      |   5.975 us |       4.58% |   5.921 us |       5.35% |  -0.054 us |  -0.90% |   SAME   |     4.094 us |     1.616 us |  -2.478 us |   -60.53% |    FAST    |
|   F64   |      I64      |      2^20      |  10.087 us |       2.88% |  10.049 us |       3.08% |  -0.039 us |  -0.39% |   SAME   |     6.141 us |     5.190 us |  -0.951 us |   -15.48% |    FAST    |
|   F64   |      I64      |      2^24      |  64.740 us |       1.54% |  71.541 us |       0.52% |   6.801 us |  10.50% |   SLOW   |    59.532 us |    68.144 us |   8.612 us |    14.47% |    SLOW    |
|   F64   |      I64      |      2^28      | 907.463 us |       0.25% |   1.073 ms |       0.04% | 165.692 us |  18.26% |   SLOW   |   904.402 us |     1.070 ms | 165.408 us |    18.29% |    SLOW    |
|  I128   |      I32      |      2^16      |   5.976 us |       4.80% |   5.969 us |       5.03% |  -0.007 us |  -0.11% |   SAME   |     4.094 us |     1.596 us |  -2.498 us |   -61.02% |    FAST    |
|  I128   |      I32      |      2^20      |  14.157 us |       1.98% |  14.101 us |       2.22% |  -0.056 us |  -0.40% |   SAME   |     8.188 us |     9.375 us |   1.187 us |    14.50% |    SLOW    |
|  I128   |      I32      |      2^24      | 119.885 us |       0.79% | 138.877 us |       0.45% |  18.992 us |  15.84% |   SLOW   |   116.405 us |   134.968 us |  18.563 us |    15.95% |    SLOW    |
|  I128   |      I32      |      2^28      |   1.812 ms |       0.11% |   2.142 ms |       0.02% | 329.563 us |  18.18% |   SLOW   |     1.806 ms |     2.140 ms | 333.777 us |    18.48% |    SLOW    |
|  I128   |      I64      |      2^16      |   5.983 us |       4.41% |   6.001 us |       5.10% |   0.017 us |   0.29% |   SAME   |     4.094 us |     1.578 us |  -2.516 us |   -61.46% |    FAST    |
|  I128   |      I64      |      2^20      |  14.136 us |       2.12% |  14.144 us |       2.09% |   0.008 us |   0.06% |   SAME   |     8.188 us |     9.380 us |   1.192 us |    14.56% |    SLOW    |
|  I128   |      I64      |      2^24      | 120.039 us |       0.76% | 138.907 us |       0.43% |  18.869 us |  15.72% |   SLOW   |   116.317 us |   134.990 us |  18.673 us |    16.05% |    SLOW    |
|  I128   |      I64      |      2^28      |   1.812 ms |       0.11% |   2.142 ms |       0.03% | 329.903 us |  18.21% |   SLOW   |     1.805 ms |     2.140 ms | 334.969 us |    18.56% |    SLOW    |

# nstream

## [0] NVIDIA B200

|  T{ct}  |  OffsetT{ct}  |  Elements{io}  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |   B Ref Time |   B Cmp Time |     B Diff |   B %Diff |  B Status  |
|---------|---------------|----------------|------------|-------------|------------|-------------|------------|---------|----------|--------------|--------------|------------|-----------|------------|
|   I8    |      I32      |      2^16      |   5.989 us |       4.55% |   5.949 us |       5.01% |  -0.039 us |  -0.66% |   SAME   |     4.094 us |     1.634 us |  -2.460 us |   -60.09% |    FAST    |
|   I8    |      I32      |      2^20      |   6.342 us |       9.26% |   6.419 us |      10.11% |   0.077 us |   1.21% |   SAME   |     4.098 us |     3.117 us |  -0.981 us |   -23.94% |    FAST    |
|   I8    |      I32      |      2^24      |  19.445 us |       4.85% |  19.424 us |       4.68% |  -0.021 us |  -0.11% |   SAME   |    14.343 us |    13.192 us |  -1.151 us |    -8.02% |    FAST    |
|   I8    |      I32      |      2^28      | 212.863 us |       0.10% | 211.458 us |       0.43% |  -1.405 us |  -0.66% |   FAST   |   209.033 us |   205.241 us |  -3.792 us |    -1.81% |    FAST    |
|   I8    |      I64      |      2^16      |   6.009 us |       4.13% |   5.967 us |       4.81% |  -0.042 us |  -0.70% |   SAME   |     4.094 us |     1.645 us |  -2.449 us |   -59.82% |    FAST    |
|   I8    |      I64      |      2^20      |   6.334 us |       8.91% |   6.337 us |       9.49% |   0.003 us |   0.05% |   SAME   |     4.098 us |     3.126 us |  -0.972 us |   -23.71% |    FAST    |
|   I8    |      I64      |      2^24      |  19.865 us |       4.15% |  19.762 us |       4.17% |  -0.102 us |  -0.51% |   SAME   |    14.343 us |    13.214 us |  -1.129 us |    -7.87% |    FAST    |
|   I8    |      I64      |      2^28      | 215.113 us |       0.26% | 214.918 us |       0.08% |  -0.195 us |  -0.09% |   FAST   |   212.031 us |   208.688 us |  -3.342 us |    -1.58% |    FAST    |
|   I16   |      I32      |      2^16      |   6.009 us |       4.15% |   5.982 us |       4.60% |  -0.027 us |  -0.45% |   SAME   |     4.094 us |     1.614 us |  -2.480 us |   -60.58% |    FAST    |
|   I16   |      I32      |      2^20      |   7.932 us |       6.13% |   7.939 us |       5.75% |   0.006 us |   0.08% |   SAME   |     4.098 us |     3.114 us |  -0.984 us |   -24.01% |    FAST    |
|   I16   |      I32      |      2^24      |  26.983 us |       2.80% |  28.431 us |       1.60% |   1.448 us |   5.37% |   SLOW   |    22.510 us |    23.762 us |   1.252 us |     5.56% |    SLOW    |
|   I16   |      I32      |      2^28      | 322.852 us |       0.30% | 362.068 us |       0.20% |  39.215 us |  12.15% |   SLOW   |   319.790 us |   357.594 us |  37.804 us |    11.82% |    SLOW    |
|   I16   |      I64      |      2^16      |   5.987 us |       4.39% |   5.932 us |       4.98% |  -0.055 us |  -0.92% |   SAME   |     4.094 us |     1.619 us |  -2.475 us |   -60.44% |    FAST    |
|   I16   |      I64      |      2^20      |   7.946 us |       6.01% |   7.860 us |       6.95% |  -0.086 us |  -1.08% |   SAME   |     4.098 us |     3.140 us |  -0.958 us |   -23.38% |    FAST    |
|   I16   |      I64      |      2^24      |  27.225 us |       3.23% |  28.478 us |       1.04% |   1.253 us |   4.60% |   SLOW   |    22.522 us |    23.750 us |   1.228 us |     5.45% |    SLOW    |
|   I16   |      I64      |      2^28      | 327.108 us |       0.26% | 361.990 us |       0.22% |  34.882 us |  10.66% |   SLOW   |   324.185 us |   357.615 us |  33.430 us |    10.31% |    SLOW    |
|   F32   |      I32      |      2^16      |   5.977 us |       4.80% |   5.978 us |       4.67% |   0.000 us |   0.01% |   SAME   |     4.094 us |     1.632 us |  -2.462 us |   -60.13% |    FAST    |
|   F32   |      I32      |      2^20      |   8.471 us |       7.81% |   8.566 us |       8.58% |   0.095 us |   1.12% |   SAME   |     4.514 us |     3.815 us |  -0.699 us |   -15.48% |    FAST    |
|   F32   |      I32      |      2^24      |  46.447 us |       1.83% |  49.517 us |       1.50% |   3.070 us |   6.61% |   SLOW   |    41.011 us |    46.008 us |   4.997 us |    12.19% |    SLOW    |
|   F32   |      I32      |      2^28      | 598.549 us |       0.15% | 717.324 us |       0.12% | 118.774 us |  19.84% |   SLOW   |   592.635 us |   713.743 us | 121.108 us |    20.44% |    SLOW    |
|   F32   |      I64      |      2^16      |   5.988 us |       4.55% |   5.959 us |       4.85% |  -0.029 us |  -0.49% |   SAME   |     4.094 us |     1.614 us |  -2.480 us |   -60.57% |    FAST    |
|   F32   |      I64      |      2^20      |   8.514 us |       8.41% |   8.492 us |       8.07% |  -0.022 us |  -0.26% |   SAME   |     4.715 us |     3.826 us |  -0.889 us |   -18.86% |    FAST    |
|   F32   |      I64      |      2^24      |  46.541 us |       1.71% |  49.648 us |       1.55% |   3.107 us |   6.68% |   SLOW   |    41.069 us |    45.971 us |   4.902 us |    11.94% |    SLOW    |
|   F32   |      I64      |      2^28      | 599.451 us |       0.14% | 717.347 us |       0.13% | 117.896 us |  19.67% |   SLOW   |   593.715 us |   713.745 us | 120.029 us |    20.22% |    SLOW    |
|   F64   |      I32      |      2^16      |   5.958 us |       4.65% |   5.965 us |       4.72% |   0.007 us |   0.11% |   SAME   |     4.094 us |     1.625 us |  -2.469 us |   -60.31% |    FAST    |
|   F64   |      I32      |      2^20      |  11.491 us |       7.71% |  11.504 us |       7.48% |   0.013 us |   0.11% |   SAME   |     6.141 us |     5.222 us |  -0.919 us |   -14.96% |    FAST    |
|   F64   |      I32      |      2^24      |  84.027 us |       0.80% |  84.034 us |       0.92% |   0.007 us |   0.01% |   SAME   |    77.789 us |    74.972 us |  -2.817 us |    -3.62% |    FAST    |
|   F64   |      I32      |      2^28      |   1.184 ms |       0.05% |   1.184 ms |       0.05% |   0.029 us |   0.00% |   SAME   |     1.176 ms |     1.173 ms |  -2.920 us |    -0.25% |    SAME    |
|   F64   |      I64      |      2^16      |   5.966 us |       4.90% |   5.963 us |       4.86% |  -0.003 us |  -0.06% |   SAME   |     4.094 us |     1.637 us |  -2.457 us |   -60.02% |    FAST    |
|   F64   |      I64      |      2^20      |  11.513 us |       7.79% |  11.494 us |       7.78% |  -0.019 us |  -0.17% |   SAME   |     6.141 us |     5.216 us |  -0.925 us |   -15.06% |    FAST    |
|   F64   |      I64      |      2^24      |  84.030 us |       0.82% |  84.060 us |       0.76% |   0.030 us |   0.04% |   SAME   |    77.804 us |    74.994 us |  -2.810 us |    -3.61% |    FAST    |
|   F64   |      I64      |      2^28      |   1.184 ms |       0.05% |   1.184 ms |       0.06% |   0.075 us |   0.01% |   SAME   |     1.176 ms |     1.173 ms |  -2.845 us |    -0.24% |    SAME    |
|  I128   |      I32      |      2^16      |   6.066 us |       5.68% |   6.017 us |       5.84% |  -0.049 us |  -0.81% |   SAME   |     4.094 us |     1.592 us |  -2.502 us |   -61.11% |    FAST    |
|  I128   |      I32      |      2^20      |  16.358 us |       2.88% |  16.406 us |       2.93% |   0.048 us |   0.29% |   SAME   |     8.188 us |     9.399 us |   1.211 us |    14.79% |    SLOW    |
|  I128   |      I32      |      2^24      | 156.487 us |       0.62% | 156.500 us |       0.61% |   0.014 us |   0.01% |   SAME   |   150.507 us |   147.953 us |  -2.554 us |    -1.70% |    FAST    |
|  I128   |      I32      |      2^28      |   2.356 ms |       0.04% |   2.356 ms |       0.04% |  -0.046 us |  -0.00% |   SAME   |     2.349 ms |     2.347 ms |  -1.491 us |    -0.06% |    SAME    |
|  I128   |      I64      |      2^16      |   6.084 us |       6.74% |   6.073 us |       6.19% |  -0.010 us |  -0.17% |   SAME   |     4.094 us |     1.607 us |  -2.487 us |   -60.74% |    FAST    |
|  I128   |      I64      |      2^20      |  16.395 us |       2.82% |  16.393 us |       2.69% |  -0.002 us |  -0.01% |   SAME   |     8.188 us |     9.410 us |   1.222 us |    14.92% |    SLOW    |
|  I128   |      I64      |      2^24      | 156.555 us |       0.64% | 156.506 us |       0.62% |  -0.049 us |  -0.03% |   SAME   |   150.525 us |   147.965 us |  -2.560 us |    -1.70% |    FAST    |
|  I128   |      I64      |      2^28      |   2.356 ms |       0.04% |   2.356 ms |       0.04% |  -0.012 us |  -0.00% |   SAME   |     2.349 ms |     2.347 ms |  -1.875 us |    -0.08% |    SAME    |

# Summary

- Total Matches: 160
  - Pass    (diff <= min_noise): 130
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  190

The table becomes a bit unwieldy. We could consider dropping the Diff and B Diff columns to improve the situation. Alternatively, we could emit two rows per benchmark.

bernhardmgruber avatar Aug 14 '25 17:08 bernhardmgruber

I like the idea of splitting them to a new line, I think it'd be cleaner.

Or making them into separate tables? That way you could still quickly scan a column to check for outliers. That'd be harder if the timings were alternating cold/batch.

alliepiper avatar Aug 20 '25 22:08 alliepiper