branchless-utf8
branchless-utf8 copied to clipboard
Speed up the branchless UTF-8 decoder by removing !len
In your post, you say: "Adding that !len is actually somewhat costly, though I couldn’t figure out why."
My suspicion was that it is because the "!" operator would essentially behave like a branch, returning 1 if the input is 0 and 0 otherwise.
So, my idea was to copy the table of lengths you have and create another one for "error lengths" to get that same effect (0 when it's OK and 1 when there is an error, to ensure that it moves forward at least one byte, as mentioned).
The throughput went up from 504 MB/s to 557 MB/s on my machine.
For what it's worth, I actually see the speed drop from ~647 MB/s to ~611 MB/s with this patch applied on my system (3700x).