portable-snippets
portable-snippets copied to clipboard
Benchmarking program for unaligned accesses
We need a program to benchmark the different methods for the unaligned module. Shouldn't be difficult now that the clock module is in reasonable shape…
@Cyan4973, did you do anything special for that blog post, or just benchmarking xxhash? If the former, I don't suppose you still have the code sitting around somewhere (and would be willing to share it)?
I guess I just used the internal benchmark module of xxhsum and lz4 (command -b).
If you're on Linux x86, you can also consider uarch-bench to test the "raw" performance of loads/stores of various size and misalignments, perhaps as a baseline to compare to the psnip versions. It measures all 64-byte alignments, and the results align with what we know from published performance and optimization manuals. In uses small snippets of asm for the actual test code, which is the only thing that would need to be ported to make it work on other archs.
x86, and especially Linux, are pretty well tested. So is ARM; this issue is really more for more exoctic architectures and compilers.
uarch-bench looks very cool, though; could be useful for SIMDe.
Right, it makes sense. I do want to support other mainstream archs on uarch bench, but that probably mostly just means ARM.