Speed up the branchless UTF-8 decoder by removing !len

Open danielthegray opened this issue 4 years ago • 1 comments

In your post, you say: "Adding that !len is actually somewhat costly, though I couldn’t figure out why."

My suspicion was that it is because the "!" operator would essentially behave like a branch, returning 1 if the input is 0 and 0 otherwise.

So, my idea was to copy the table of lengths you have and create another one for "error lengths" to get that same effect (0 when it's OK and 1 when there is an error, to ensure that it moves forward at least one byte, as mentioned).

The throughput went up from 504 MB/s to 557 MB/s on my machine.

Aug 21 '21 23:08 danielthegray

For what it's worth, I actually see the speed drop from ~647 MB/s to ~611 MB/s with this patch applied on my system (3700x).

Jun 23 '22 06:06 N-R-K