dexed icon indicating copy to clipboard operation
dexed copied to clipboard

SIMD-vectorize operators using libhwy

Open risicle opened this issue 2 years ago • 0 comments

This is an offshoot from some experimentation I was doing using dexed and I wasn't really planning on developing further, but in case it's useful to anyone I'll present it here.

This uses google's highway library to add SIMD versions of the most expensive parts of the synthesis. My crude testing suggests modest speed improvements of 10-20% for SSE2 to AVX2, but on an AVX512 machine this easily doubles speed for me. An ARM NEON system showed an embarrassing 4% acceleration.

Dexed doesn't have a test suite, but comparing the results against the existing scalar implementation showed a maximum relative error of ~0.003 between the two, which will be attributable to a different order of operations in some places.

I don't know whether you'd ever actually want to make dexed depend on libhwy, but this would probably take a bit more polish if you ever wanted to actually merge it - I've tested it only on a limited variety of machines/architectures, haven't included options to disable vectorization support, have only configured libhwy for single-dispatch (no single-binary, dynamic cpu-extension-detecting, but I don't imagine it would be too hard to set that up).

The feedback-based operator loops are way too hard to vectorize, so they are left alone.

risicle avatar Jul 26 '23 20:07 risicle