utf_utils icon indicating copy to clipboard operation
utf_utils copied to clipboard

Odd benchmark results for BH decoder

Open hoehrmann opened this issue 6 years ago • 2 comments

Comparing slides #122 and #125 from the CppCon PDF, it seems the BH decoder is almost twice as fast converting UTF-8 to UTF-16 than it is converting UTF-8 to UTF-32 for the english_wiki.txt test case. I can reproduce that locally, the numbers I get are as follows:

******  UTF-8 to UTF-32 Conversion  ******

for file: 'english_wiki.txt'
UTF-8 to UTF-32 took  720 msec (360275/358131 units/points) (745 reps) (iconv)
UTF-8 to UTF-32 took 1641 msec (360275/358131 units/points) (745 reps) (llvm)
UTF-8 to UTF-32 took 1808 msec (360275/358131 units/points) (745 reps) (av)
UTF-8 to UTF-32 took 1484 msec (360275/358131 units/points) (745 reps) (std::codecvt)
UTF-8 to UTF-32 took  279 msec (360275/358131 units/points) (745 reps) (Boost.Text)
UTF-8 to UTF-32 took  748 msec (360275/358131 units/points) (745 reps) (hoehrmann)
UTF-8 to UTF-32 took  343 msec (360275/358131 units/points) (745 reps) (kewb-basic)
UTF-8 to UTF-32 took  180 msec (360275/358131 units/points) (745 reps) (kewb-fast)
UTF-8 to UTF-32 took  112 msec (360275/358131 units/points) (745 reps) (kewb-sse)

...

******  UTF-8 to UTF-16 Conversion  ******

for file: 'english_wiki.txt'
UTF-8 to UTF-16 took  850 msec (360275/358137 units/units) (745 reps) (iconv)
UTF-8 to UTF-16 took 1397 msec (360275/358137 units/units) (745 reps) (llvm)
UTF-8 to UTF-16 took 1592 msec (360275/358137 units/units) (745 reps) (std::codecvt)
UTF-8 to UTF-16 took  836 msec (360275/358137 units/units) (745 reps) (Boost.Text)
UTF-8 to UTF-16 took  443 msec (360275/358137 units/units) (745 reps) (hoehrmann)
UTF-8 to UTF-16 took  360 msec (360275/358137 units/units) (745 reps) (kewb-basic)
UTF-8 to UTF-16 took  178 msec (360275/358137 units/units) (745 reps) (kewb-fast)
UTF-8 to UTF-16 took   62 msec (360275/358137 units/units) (745 reps) (kewb-sse)

Since conversion to UTF-16 is a lot more work than conversion to UTF-32, that is a rather odd result and apparently not explained by memory throughput differences (UTF-32 probably touches more bytes than UTF-16) as other decoders seem largely unaffected.

(My numbers are with GCC 5.4.0 on a bare metal Linux on an old i5 in power saving mode. Looks like the code now uses a much smaller repetition count than when you generated the data for the slides?)

hoehrmann avatar Nov 11 '18 00:11 hoehrmann

Yes, I see similar behavior. I don't have a good explanation for it. FWIW, the difference is smaller with Clang.

As I recall, the timings in the slides were acquired with -rx 30 so that the reps yielded ~1GB of input text.

BobSteagall avatar Nov 11 '18 00:11 BobSteagall

Hmm. Might be worth finding out. The original code was optimised for 32-bit targets. When I change the decode function to take uint_fast32_ts or uint64_ts instead of uint32_ts, I already get a 33% speedup for UTF-32 with my gcc setup.

hoehrmann avatar Nov 11 '18 01:11 hoehrmann