BitFaster.Caching
BitFaster.Caching copied to clipboard
Implement LFU sketch using arm64 intrinsics
trafficstars
Below the *BlockAvx benchmark runs on the ARM64 intrinsics, FlatAVX is not implemented.
Mac M2
Repros on commit 7d7d023, but not newest.
BenchmarkDotNet v0.13.12, macOS Sonoma 14.5 (23F79) [Darwin 23.5.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.100
[Host] : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
.NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
.NET 8.0 : .NET 8.0.0 (8.0.23.53103), Arm64 RyuJIT AdvSIMD
| Method | Runtime | Size | Mean | Error | StdDev | Ratio | Allocated |
|---|---|---|---|---|---|---|---|
| **IncFlat7d7d023 | .NET 6.0 | 1024 | 13.870 ns | 0.0025 ns | 0.0020 ns | 1.00 | - |
| IncBlock | .NET 6.0 | 1024 | 13.042 ns | 0.0172 ns | 0.0161 ns | 0.94 | - |
| IncBlockAvx | .NET 6.0 | 1024 | 6.736 ns | 0.1010 ns | 0.0944 ns | 0.49 | - |
| IncFlat | .NET 8.0 | 1024 | 6.348 ns | 0.0012 ns | 0.0009 ns | 1.00 | - |
| IncBlock | .NET 8.0 | 1024 | 9.951 ns | 0.0909 ns | 0.0851 ns | 1.57 | - |
| IncBlockAvx | .NET 8.0 | 1024 | 5.842 ns | 0.0145 ns | 0.0135 ns | 0.92 | - |
| IncFlat | .NET 6.0 | 32768 | 14.036 ns | 0.2049 ns | 0.1916 ns | 1.00 | - |
| IncBlock | .NET 6.0 | 32768 | 13.799 ns | 0.0731 ns | 0.0683 ns | 0.98 | - |
| IncBlockAvx | .NET 6.0 | 32768 | 7.490 ns | 0.0143 ns | 0.0134 ns | 0.53 | - |
| IncFlat | .NET 8.0 | 32768 | 7.036 ns | 0.0955 ns | 0.0893 ns | 1.00 | - |
| IncBlock | .NET 8.0 | 32768 | 10.223 ns | 0.0179 ns | 0.0150 ns | 1.45 | - |
| IncBlockAvx | .NET 8.0 | 32768 | 6.507 ns | 0.0051 ns | 0.0042 ns | 0.92 | - |
| IncFlat | .NET 6.0 | 524288 | 14.622 ns | 0.0821 ns | 0.0768 ns | 1.00 | - |
| IncBlock | .NET 6.0 | 524288 | 16.510 ns | 0.1516 ns | 0.1418 ns | 1.13 | - |
| IncBlockAvx | .NET 6.0 | 524288 | 7.660 ns | 0.0179 ns | 0.0159 ns | 0.52 | - |
| IncFlat | .NET 8.0 | 524288 | 7.607 ns | 0.0578 ns | 0.0541 ns | 1.00 | - |
| IncBlock | .NET 8.0 | 524288 | 13.160 ns | 0.0397 ns | 0.0371 ns | 1.73 | - |
| IncBlockAvx | .NET 8.0 | 524288 | 6.672 ns | 0.0112 ns | 0.0099 ns | 0.88 | - |
| IncFlat | .NET 6.0 | 8388608 | 61.644 ns | 0.0636 ns | 0.0564 ns | 1.00 | - |
| IncBlock | .NET 6.0 | 8388608 | 53.673 ns | 0.0489 ns | 0.0458 ns | 0.87 | - |
| IncBlockAvx | .NET 6.0 | 8388608 | 30.969 ns | 0.0283 ns | 0.0236 ns | 0.50 | - |
| IncFlat | .NET 8.0 | 8388608 | 34.704 ns | 0.0418 ns | 0.0349 ns | 1.00 | - |
| IncBlock | .NET 8.0 | 8388608 | 39.008 ns | 0.0364 ns | 0.0322 ns | 1.12 | - |
| IncBlockAvx | .NET 8.0 | 8388608 | 27.053 ns | 0.0253 ns | 0.0224 ns | 0.78 | - |
| IncFlat | .NET 6.0 | 134217728 | 68.909 ns | 0.1676 ns | 0.1486 ns | 1.00 | - |
| IncBlock | .NET 6.0 | 134217728 | 63.177 ns | 0.0881 ns | 0.0824 ns | 0.92 | - |
| IncBlockAvx | .NET 6.0 | 134217728 | 35.213 ns | 0.0200 ns | 0.0177 ns | 0.51 | - |
| IncFlat | .NET 8.0 | 134217728 | 39.842 ns | 0.0742 ns | 0.0657 ns | 1.00 | - |
| IncBlock | .NET 8.0 | 134217728 | 44.355 ns | 0.0494 ns | 0.0438 ns | 1.11 | - |
| IncBlockAvx | .NET 8.0 | 134217728 | 30.773 ns | 0.0278 ns | 0.0247 ns | 0.77 | - |
| Method | Runtime | Size | Mean | Error | StdDev | Ratio | Allocated |
|---|---|---|---|---|---|---|---|
| FrequencyFlat | .NET 6.0 | 1024 | 22.029 ns | 0.0152 ns | 0.0134 ns | 1.00 | - |
| FrequencyBlock | .NET 6.0 | 1024 | 17.766 ns | 0.0166 ns | 0.0147 ns | 0.81 | - |
| FrequencyBlockAvx | .NET 6.0 | 1024 | 11.007 ns | 0.0011 ns | 0.0008 ns | 0.50 | - |
| FrequencyFlat | .NET 8.0 | 1024 | 10.750 ns | 0.0018 ns | 0.0015 ns | 1.00 | - |
| FrequencyBlock | .NET 8.0 | 1024 | 12.805 ns | 0.0063 ns | 0.0056 ns | 1.19 | - |
| FrequencyBlockAvx | .NET 8.0 | 1024 | 8.762 ns | 0.0895 ns | 0.0837 ns | 0.81 | - |
| FrequencyFlat | .NET 6.0 | 32768 | 22.058 ns | 0.0131 ns | 0.0103 ns | 1.00 | - |
| FrequencyBlock | .NET 6.0 | 32768 | 19.284 ns | 0.0289 ns | 0.0241 ns | 0.87 | - |
| FrequencyBlockAvx | .NET 6.0 | 32768 | 11.791 ns | 0.1241 ns | 0.1161 ns | 0.53 | - |
| FrequencyFlat | .NET 8.0 | 32768 | 11.351 ns | 0.0099 ns | 0.0092 ns | 1.00 | - |
| FrequencyBlock | .NET 8.0 | 32768 | 13.599 ns | 0.0457 ns | 0.0405 ns | 1.20 | - |
| FrequencyBlockAvx | .NET 8.0 | 32768 | 9.286 ns | 0.0235 ns | 0.0208 ns | 0.82 | - |
| FrequencyFlat | .NET 6.0 | 524288 | 22.361 ns | 0.3072 ns | 0.2873 ns | 1.00 | - |
| FrequencyBlock | .NET 6.0 | 524288 | 20.041 ns | 0.0355 ns | 0.0332 ns | 0.90 | - |
| FrequencyBlockAvx | .NET 6.0 | 524288 | 12.155 ns | 0.0188 ns | 0.0157 ns | 0.54 | - |
| FrequencyFlat | .NET 8.0 | 524288 | 11.830 ns | 0.0348 ns | 0.0326 ns | 1.00 | - |
| FrequencyBlock | .NET 8.0 | 524288 | 14.052 ns | 0.0447 ns | 0.0397 ns | 1.19 | - |
| FrequencyBlockAvx | .NET 8.0 | 524288 | 9.436 ns | 0.0219 ns | 0.0205 ns | 0.80 | - |
| FrequencyFlat | .NET 6.0 | 8388608 | 105.323 ns | 0.1536 ns | 0.1283 ns | 1.00 | - |
| FrequencyBlock | .NET 6.0 | 8388608 | 62.690 ns | 0.0273 ns | 0.0228 ns | 0.60 | - |
| FrequencyBlockAvx | .NET 6.0 | 8388608 | 43.439 ns | 0.0327 ns | 0.0255 ns | 0.41 | - |
| FrequencyFlat | .NET 8.0 | 8388608 | 59.741 ns | 0.1223 ns | 0.1085 ns | 1.00 | - |
| FrequencyBlock | .NET 8.0 | 8388608 | 59.074 ns | 0.0829 ns | 0.0735 ns | 0.99 | - |
| FrequencyBlockAvx | .NET 8.0 | 8388608 | 40.865 ns | 0.0844 ns | 0.0789 ns | 0.68 | - |
| FrequencyFlat | .NET 6.0 | 134217728 | 121.675 ns | 0.4978 ns | 0.4656 ns | 1.00 | - |
| FrequencyBlock | .NET 6.0 | 134217728 | 72.443 ns | 0.0492 ns | 0.0411 ns | 0.60 | - |
| FrequencyBlockAvx | .NET 6.0 | 134217728 | 48.987 ns | 0.0746 ns | 0.0662 ns | 0.40 | - |
| FrequencyFlat | .NET 8.0 | 134217728 | 67.616 ns | 0.2006 ns | 0.1877 ns | 1.00 | - |
| FrequencyBlock | .NET 8.0 | 134217728 | 68.747 ns | 0.0631 ns | 0.0527 ns | 1.02 | - |
| FrequencyBlockAvx | .NET 8.0 | 134217728 | 46.334 ns | 0.1333 ns | 0.1113 ns | 0.69 | - |
Windows Cobalt 100 (VM)
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22000.2960/21H2/SunValley)
Pioneer, 1 CPU, 16 logical and 16 physical cores
.NET SDK 8.0.300
[Host] : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
.NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
Job=.NET 6.0 Runtime=.NET 6.0 Alloc Ratio=NA
| Method | Size | Mean | Error | StdDev | Ratio | Allocated |
|---|---|---|---|---|---|---|
| IncFlat | 32768 | 23.19 ns | 0.013 ns | 0.012 ns | 1.00 | - |
| IncFlatAvx | 32768 | 23.41 ns | 0.017 ns | 0.015 ns | 1.01 | - |
| IncBlock | 32768 | 20.36 ns | 0.010 ns | 0.009 ns | 0.88 | - |
| IncBlockAvx | 32768 | 13.89 ns | 0.004 ns | 0.003 ns | 0.60 | - |
| IncFlat | 524288 | 64.55 ns | 1.262 ns | 2.243 ns | 1.00 | - |
| IncFlatAvx | 524288 | 64.56 ns | 1.251 ns | 1.753 ns | 1.01 | - |
| IncBlock | 524288 | 50.67 ns | 1.008 ns | 1.120 ns | 0.78 | - |
| IncBlockAvx | 524288 | 31.84 ns | 0.632 ns | 1.290 ns | 0.49 | - |
| IncFlat | 8388608 | 92.59 ns | 2.143 ns | 6.285 ns | 1.00 | - |
| IncFlatAvx | 8388608 | 91.21 ns | 2.231 ns | 6.507 ns | 0.99 | - |
| IncBlock | 8388608 | 82.57 ns | 1.642 ns | 4.710 ns | 0.90 | - |
| IncBlockAvx | 8388608 | 41.18 ns | 0.815 ns | 1.921 ns | 0.45 | - |
| IncFlat | 134217728 | 205.91 ns | 3.865 ns | 4.135 ns | 1.00 | - |
| IncFlatAvx | 134217728 | 205.51 ns | 2.790 ns | 2.473 ns | 0.99 | - |
| IncBlock | 134217728 | 183.31 ns | 3.610 ns | 5.061 ns | 0.89 | - |
| IncBlockAvx | 134217728 | 87.76 ns | 1.755 ns | 3.739 ns | 0.43 | - |
| Method | Size | Mean | Error | StdDev | Ratio | Allocated |
|---|---|---|---|---|---|---|
| FrequencyFlat | 32768 | 38.03 ns | 0.034 ns | 0.028 ns | 1.00 | - |
| FrequencyFlatAvx | 32768 | 38.05 ns | 0.040 ns | 0.037 ns | 1.00 | - |
| FrequencyBlock | 32768 | 25.98 ns | 0.006 ns | 0.005 ns | 0.68 | - |
| FrequencyBlockAvx | 32768 | 23.02 ns | 0.007 ns | 0.006 ns | 0.61 | - |
| FrequencyFlat | 524288 | 80.73 ns | 1.593 ns | 2.831 ns | 1.00 | - |
| FrequencyFlatAvx | 524288 | 81.06 ns | 1.606 ns | 3.056 ns | 1.01 | - |
| FrequencyBlock | 524288 | 56.54 ns | 1.009 ns | 1.381 ns | 0.70 | - |
| FrequencyBlockAvx | 524288 | 58.12 ns | 1.147 ns | 2.067 ns | 0.72 | - |
| FrequencyFlat | 8388608 | 111.51 ns | 2.220 ns | 6.077 ns | 1.00 | - |
| FrequencyFlatAvx | 8388608 | 113.42 ns | 2.580 ns | 7.525 ns | 1.02 | - |
| FrequencyBlock | 8388608 | 89.49 ns | 1.769 ns | 4.784 ns | 0.80 | - |
| FrequencyBlockAvx | 8388608 | 87.31 ns | 1.840 ns | 5.397 ns | 0.79 | - |
| FrequencyFlat | 134217728 | 226.23 ns | 4.506 ns | 4.215 ns | 1.00 | - |
| FrequencyFlatAvx | 134217728 | 223.52 ns | 2.770 ns | 2.591 ns | 0.99 | - |
| FrequencyBlock | 134217728 | 195.64 ns | 3.683 ns | 3.941 ns | 0.87 | - |
| FrequencyBlockAvx | 134217728 | 187.74 ns | 3.590 ns | 4.135 ns | 0.83 | - |