odht icon indicating copy to clipboard operation
odht copied to clipboard

Provide a SIMD implementation of swisstable_group_query suitable for ARM

Open wesleywiser opened this issue 4 years ago • 2 comments

Briefly mentioned in #16, but as ARM devices become more popular, it would great to have an accelerated implementation for them as well.

wesleywiser avatar Sep 20 '21 13:09 wesleywiser

According to this comment in hashbrown it might not be worth the trouble:

// Use the SSE2 implementation if possible: it allows us to scan 16 buckets // at once instead of 8. We don't bother with AVX since it would require // runtime dispatch and wouldn't gain us much anyways: the probability of // finding a match drops off drastically after the first few buckets. // // I attempted an implementation on ARM using NEON instructions, but it // turns out that most NEON instructions have multi-cycle latency, which in // the end outweighs any gains over the generic implementation.

Also, according to local benchmarks someone ran for me on an M1 MacMini, the non-SIMD version there still easily outperformed the SIMD version on an AMD Ryzen 5900x 😃

michaelwoerister avatar Sep 20 '21 13:09 michaelwoerister

I just found this PR/discussion in the hashbrown repo: https://github.com/rust-lang/hashbrown/pull/269 Very interesting!

michaelwoerister avatar Oct 11 '21 09:10 michaelwoerister