BitFaster.Caching icon indicating copy to clipboard operation
BitFaster.Caching copied to clipboard

Implement LFU sketch using arm64 intrinsics

Open bitfaster opened this issue 1 year ago • 5 comments
trafficstars

Below the *BlockAvx benchmark runs on the ARM64 intrinsics, FlatAVX is not implemented.

Mac M2

Repros on commit 7d7d023, but not newest.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.5 (23F79) [Darwin 23.5.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.100
  [Host]   : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.0 (8.0.23.53103), Arm64 RyuJIT AdvSIMD

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Method Runtime Size Mean Error StdDev Ratio Allocated
**IncFlat7d7d023 .NET 6.0 1024 13.870 ns 0.0025 ns 0.0020 ns 1.00 -
IncBlock .NET 6.0 1024 13.042 ns 0.0172 ns 0.0161 ns 0.94 -
IncBlockAvx .NET 6.0 1024 6.736 ns 0.1010 ns 0.0944 ns 0.49 -
IncFlat .NET 8.0 1024 6.348 ns 0.0012 ns 0.0009 ns 1.00 -
IncBlock .NET 8.0 1024 9.951 ns 0.0909 ns 0.0851 ns 1.57 -
IncBlockAvx .NET 8.0 1024 5.842 ns 0.0145 ns 0.0135 ns 0.92 -
IncFlat .NET 6.0 32768 14.036 ns 0.2049 ns 0.1916 ns 1.00 -
IncBlock .NET 6.0 32768 13.799 ns 0.0731 ns 0.0683 ns 0.98 -
IncBlockAvx .NET 6.0 32768 7.490 ns 0.0143 ns 0.0134 ns 0.53 -
IncFlat .NET 8.0 32768 7.036 ns 0.0955 ns 0.0893 ns 1.00 -
IncBlock .NET 8.0 32768 10.223 ns 0.0179 ns 0.0150 ns 1.45 -
IncBlockAvx .NET 8.0 32768 6.507 ns 0.0051 ns 0.0042 ns 0.92 -
IncFlat .NET 6.0 524288 14.622 ns 0.0821 ns 0.0768 ns 1.00 -
IncBlock .NET 6.0 524288 16.510 ns 0.1516 ns 0.1418 ns 1.13 -
IncBlockAvx .NET 6.0 524288 7.660 ns 0.0179 ns 0.0159 ns 0.52 -
IncFlat .NET 8.0 524288 7.607 ns 0.0578 ns 0.0541 ns 1.00 -
IncBlock .NET 8.0 524288 13.160 ns 0.0397 ns 0.0371 ns 1.73 -
IncBlockAvx .NET 8.0 524288 6.672 ns 0.0112 ns 0.0099 ns 0.88 -
IncFlat .NET 6.0 8388608 61.644 ns 0.0636 ns 0.0564 ns 1.00 -
IncBlock .NET 6.0 8388608 53.673 ns 0.0489 ns 0.0458 ns 0.87 -
IncBlockAvx .NET 6.0 8388608 30.969 ns 0.0283 ns 0.0236 ns 0.50 -
IncFlat .NET 8.0 8388608 34.704 ns 0.0418 ns 0.0349 ns 1.00 -
IncBlock .NET 8.0 8388608 39.008 ns 0.0364 ns 0.0322 ns 1.12 -
IncBlockAvx .NET 8.0 8388608 27.053 ns 0.0253 ns 0.0224 ns 0.78 -
IncFlat .NET 6.0 134217728 68.909 ns 0.1676 ns 0.1486 ns 1.00 -
IncBlock .NET 6.0 134217728 63.177 ns 0.0881 ns 0.0824 ns 0.92 -
IncBlockAvx .NET 6.0 134217728 35.213 ns 0.0200 ns 0.0177 ns 0.51 -
IncFlat .NET 8.0 134217728 39.842 ns 0.0742 ns 0.0657 ns 1.00 -
IncBlock .NET 8.0 134217728 44.355 ns 0.0494 ns 0.0438 ns 1.11 -
IncBlockAvx .NET 8.0 134217728 30.773 ns 0.0278 ns 0.0247 ns 0.77 -
Method Runtime Size Mean Error StdDev Ratio Allocated
FrequencyFlat .NET 6.0 1024 22.029 ns 0.0152 ns 0.0134 ns 1.00 -
FrequencyBlock .NET 6.0 1024 17.766 ns 0.0166 ns 0.0147 ns 0.81 -
FrequencyBlockAvx .NET 6.0 1024 11.007 ns 0.0011 ns 0.0008 ns 0.50 -
FrequencyFlat .NET 8.0 1024 10.750 ns 0.0018 ns 0.0015 ns 1.00 -
FrequencyBlock .NET 8.0 1024 12.805 ns 0.0063 ns 0.0056 ns 1.19 -
FrequencyBlockAvx .NET 8.0 1024 8.762 ns 0.0895 ns 0.0837 ns 0.81 -
FrequencyFlat .NET 6.0 32768 22.058 ns 0.0131 ns 0.0103 ns 1.00 -
FrequencyBlock .NET 6.0 32768 19.284 ns 0.0289 ns 0.0241 ns 0.87 -
FrequencyBlockAvx .NET 6.0 32768 11.791 ns 0.1241 ns 0.1161 ns 0.53 -
FrequencyFlat .NET 8.0 32768 11.351 ns 0.0099 ns 0.0092 ns 1.00 -
FrequencyBlock .NET 8.0 32768 13.599 ns 0.0457 ns 0.0405 ns 1.20 -
FrequencyBlockAvx .NET 8.0 32768 9.286 ns 0.0235 ns 0.0208 ns 0.82 -
FrequencyFlat .NET 6.0 524288 22.361 ns 0.3072 ns 0.2873 ns 1.00 -
FrequencyBlock .NET 6.0 524288 20.041 ns 0.0355 ns 0.0332 ns 0.90 -
FrequencyBlockAvx .NET 6.0 524288 12.155 ns 0.0188 ns 0.0157 ns 0.54 -
FrequencyFlat .NET 8.0 524288 11.830 ns 0.0348 ns 0.0326 ns 1.00 -
FrequencyBlock .NET 8.0 524288 14.052 ns 0.0447 ns 0.0397 ns 1.19 -
FrequencyBlockAvx .NET 8.0 524288 9.436 ns 0.0219 ns 0.0205 ns 0.80 -
FrequencyFlat .NET 6.0 8388608 105.323 ns 0.1536 ns 0.1283 ns 1.00 -
FrequencyBlock .NET 6.0 8388608 62.690 ns 0.0273 ns 0.0228 ns 0.60 -
FrequencyBlockAvx .NET 6.0 8388608 43.439 ns 0.0327 ns 0.0255 ns 0.41 -
FrequencyFlat .NET 8.0 8388608 59.741 ns 0.1223 ns 0.1085 ns 1.00 -
FrequencyBlock .NET 8.0 8388608 59.074 ns 0.0829 ns 0.0735 ns 0.99 -
FrequencyBlockAvx .NET 8.0 8388608 40.865 ns 0.0844 ns 0.0789 ns 0.68 -
FrequencyFlat .NET 6.0 134217728 121.675 ns 0.4978 ns 0.4656 ns 1.00 -
FrequencyBlock .NET 6.0 134217728 72.443 ns 0.0492 ns 0.0411 ns 0.60 -
FrequencyBlockAvx .NET 6.0 134217728 48.987 ns 0.0746 ns 0.0662 ns 0.40 -
FrequencyFlat .NET 8.0 134217728 67.616 ns 0.2006 ns 0.1877 ns 1.00 -
FrequencyBlock .NET 8.0 134217728 68.747 ns 0.0631 ns 0.0527 ns 1.02 -
FrequencyBlockAvx .NET 8.0 134217728 46.334 ns 0.1333 ns 0.1113 ns 0.69 -

Windows Cobalt 100 (VM)

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22000.2960/21H2/SunValley)
Pioneer, 1 CPU, 16 logical and 16 physical cores
.NET SDK 8.0.300
  [Host]   : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD

Job=.NET 6.0  Runtime=.NET 6.0  Alloc Ratio=NA

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Method Size Mean Error StdDev Ratio Allocated
IncFlat 32768 23.19 ns 0.013 ns 0.012 ns 1.00 -
IncFlatAvx 32768 23.41 ns 0.017 ns 0.015 ns 1.01 -
IncBlock 32768 20.36 ns 0.010 ns 0.009 ns 0.88 -
IncBlockAvx 32768 13.89 ns 0.004 ns 0.003 ns 0.60 -
IncFlat 524288 64.55 ns 1.262 ns 2.243 ns 1.00 -
IncFlatAvx 524288 64.56 ns 1.251 ns 1.753 ns 1.01 -
IncBlock 524288 50.67 ns 1.008 ns 1.120 ns 0.78 -
IncBlockAvx 524288 31.84 ns 0.632 ns 1.290 ns 0.49 -
IncFlat 8388608 92.59 ns 2.143 ns 6.285 ns 1.00 -
IncFlatAvx 8388608 91.21 ns 2.231 ns 6.507 ns 0.99 -
IncBlock 8388608 82.57 ns 1.642 ns 4.710 ns 0.90 -
IncBlockAvx 8388608 41.18 ns 0.815 ns 1.921 ns 0.45 -
IncFlat 134217728 205.91 ns 3.865 ns 4.135 ns 1.00 -
IncFlatAvx 134217728 205.51 ns 2.790 ns 2.473 ns 0.99 -
IncBlock 134217728 183.31 ns 3.610 ns 5.061 ns 0.89 -
IncBlockAvx 134217728 87.76 ns 1.755 ns 3.739 ns 0.43 -
Method Size Mean Error StdDev Ratio Allocated
FrequencyFlat 32768 38.03 ns 0.034 ns 0.028 ns 1.00 -
FrequencyFlatAvx 32768 38.05 ns 0.040 ns 0.037 ns 1.00 -
FrequencyBlock 32768 25.98 ns 0.006 ns 0.005 ns 0.68 -
FrequencyBlockAvx 32768 23.02 ns 0.007 ns 0.006 ns 0.61 -
FrequencyFlat 524288 80.73 ns 1.593 ns 2.831 ns 1.00 -
FrequencyFlatAvx 524288 81.06 ns 1.606 ns 3.056 ns 1.01 -
FrequencyBlock 524288 56.54 ns 1.009 ns 1.381 ns 0.70 -
FrequencyBlockAvx 524288 58.12 ns 1.147 ns 2.067 ns 0.72 -
FrequencyFlat 8388608 111.51 ns 2.220 ns 6.077 ns 1.00 -
FrequencyFlatAvx 8388608 113.42 ns 2.580 ns 7.525 ns 1.02 -
FrequencyBlock 8388608 89.49 ns 1.769 ns 4.784 ns 0.80 -
FrequencyBlockAvx 8388608 87.31 ns 1.840 ns 5.397 ns 0.79 -
FrequencyFlat 134217728 226.23 ns 4.506 ns 4.215 ns 1.00 -
FrequencyFlatAvx 134217728 223.52 ns 2.770 ns 2.591 ns 0.99 -
FrequencyBlock 134217728 195.64 ns 3.683 ns 3.941 ns 0.87 -
FrequencyBlockAvx 134217728 187.74 ns 3.590 ns 4.135 ns 0.83 -

bitfaster avatar May 24 '24 05:05 bitfaster