base64 icon indicating copy to clipboard operation
base64 copied to clipboard

Benchmarks

Open htot opened this issue 2 years ago • 6 comments

I did some automated benchmarking on my i7-10700 and Edison (Merrifield dual core Silvermont Atom without cache memory, similar to Baytrail) that I want to share here. Strictly, this issue is for reference only. It might be useful to find those commits causing substantial performance increases or decreases. All data have been taken without OpenMP (1 thread only) and in x86_64 mode. On i7 you will see some deviation probably caused by frequency scaling / turbo boost. Don't let that disturb you. Data can be found here if you want to play yourself benchmarks.ods

Below I filter out the most interesting commits.

Encoding

Note that on Edison SSE3 encoding took a hit with 9a0d1b2. encode

# Hash Commit message
24 3f3f31c Fix build under Xcode
30 67ee3fd SSSE3->AVX2 encoding optimization
76 a5b6739 SSSE3: enc: factor encoding loop into inline function
79 99977db Generic64: enc: factor encoding loop into inline function
92 e2c6687 AVX2: enc: unroll inner loop
93 9a0d1b2 SSSE3: enc: unroll inner loop
96 bf7341f Generic64: enc: unroll inner loop
114 b8b3c58 Generic64: enc: use 12-bit lookup table

Decoding

Especially for Edison it has been a bumpy ride, with great improvements 3f3f31c and regressions 0a69845 on SSE3 but also for PLAIN cfa8bf7 and f538baa. decode

# Hash Commit message
24 3f3f31c Fix build under Xcode
29 cfa8bf7 Plain decoding optimization
35 0a69845 SSSE3->AVX2, NEON32 decoding optimization
85 6310c1f SSSE3: dec: factor decoding loop into inline function
88 f538baa Generic32: dec: factor decoding loop into inline function
100 495414b AVX2: dec: unroll inner loop
101 5874921 SSSE3: dec: unroll inner loop

htot avatar May 18 '22 20:05 htot