Quickenshtein
Quickenshtein copied to clipboard
WIP: SIMD Leftover Data
Simplifies how leftover data is processed. In some basic testing, it seems to perform better for small text (likely due to less code size needing to load up), about the same for 128-bit processing at each above that but slightly worse for 256-bit processing.
Before
Method | Job | EnvironmentVariables | RowSize | Mean | Error | StdDev | Op/s | Code Size | Allocated |
---|---|---|---|---|---|---|---|---|---|
Fill | Core (All Intrinsics) | Empty | 10 | 4.523 ns | 0.1255 ns | 0.1446 ns | 221,068,464.6 | 422 B | - |
Fill | Core (w/o AVX2) | COMPlus_EnableAVX2=0 | 10 | 4.754 ns | 0.1291 ns | 0.1435 ns | 210,360,789.7 | 385 B | - |
Fill | Core (All Intrinsics) | Empty | 300 | 20.146 ns | 0.4255 ns | 0.4900 ns | 49,637,004.7 | 422 B | - |
Fill | Core (w/o AVX2) | COMPlus_EnableAVX2=0 | 300 | 49.058 ns | 0.9581 ns | 0.8962 ns | 20,384,240.0 | 385 B | - |
Fill | Core (All Intrinsics) | Empty | 8102 | 497.672 ns | 9.9860 ns | 13.9990 ns | 2,009,355.1 | 422 B | - |
Fill | Core (w/o AVX2) | COMPlus_EnableAVX2=0 | 8102 | 830.389 ns | 15.7661 ns | 16.1907 ns | 1,204,255.5 | 385 B | - |
After
To supply later