base64
base64 copied to clipboard
[WIP] SIMD acceleration
Adds SIMD acceleration via libbase64. Depends on the yet-to-be-released haskell bindings library, libbase64-bindings.
Adds a cabal flag, simd, to enable SIMD. Enabled by default.
Benchmarks
I'm running this on a relatively quiet box, with NixOS/Linux installed.
CPU Info of the machine where I'm running these benchmarks:
❯ cat /proc/cpuinfo
...truncated...
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
stepping : 9
microcode : 0x48
cpu MHz : 3473.454
cache size : 6144 KB
...truncated...
❯ cat /proc/cpuinfo | egrep -o '(avx|avx2|ssse3|sse4_1|sse4_2)' | sort -u
avx
avx2
sse4_1
sse4_2
ssse3
If you download the current criterion benchmark reports you can view them in your browser.
For small ByteStrings, there is a significant penalty, so we should probably guard which encoding method is done based on the size of the ByteString. (EDIT: From experiments I determined that 1000 bytes is a safe threshold for encoding fallback, and 250 bytes for decoding. This PR implements the thresholds for encodeBase64_ and decodeBase64_, currently.)
Speedup summary tables (does not include unaffected under-threshold ByteStrings)
encodeBase64_ Speedups:
| Size | vs old base64 |
vs base64-bytestring |
|---|---|---|
| 1k | 1.17x | 3.07x |
| 10k | 3.45x | 12.15x |
| 100k | 4.53x | 11.71x |
| 100mm | 4.69x | 14.21x |
decodeBase64_ Speedups:
| Size | vs old base64 |
vs base64-bytestring |
|---|---|---|
| 1k | 4.06x | 5.01x |
| 10k | 11.29x | 12.28x |
| 100k | 15.63x | 16.44x |
| 100mm | 13.87x | 19.48x |

For encode, it seems like a reasonable threshold for SIMD to kick in is 1000 bytes.
For decode, it's looking more like 250 bytes.
Let's get CI running and I"ll merge
@chessai any movement on this? I'm trying to decide if i want to cut a release with or without SIMD support right now