laser
laser copied to clipboard
[Benchmarks] Cleanup fp_reduction_latency benchmarks
The fp_reduction_latency benchmarks were the very first benchmark, optimization and primitive code tested in Laser.
Unfortunately it is currently very confusing.
It should be reorganized:
-
- Multiple accumulators: https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_bench.nim and https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_packed_accum.nim
-
- raw vector intrinsics measurements: https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_packed_sse.nim and https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_sse_bench.nim
-
- Measuring max/min implementation: https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_max_bench.nim
This reorg should take into account https://github.com/nim-lang/Nim/issues/9514