stun
stun copied to clipboard
Speed up XORMappedAddress by inlining xorBytes
Description
Since xorBytes is only ever called for very small slices, the expensive part of it is the function call. Benchmark results on linux/amd64:
name old time/op new time/op delta
XORMappedAddress_AddTo-8 43.5ns ±12% 33.1ns ± 1% -23.91% (p=0.000 n=10+9)
XORMappedAddress_GetFrom-8 26.2ns ± 7% 19.6ns ±20% -25.36% (p=0.000 n=10+9)
Reference issue
Closes #116.
See also #116, which does the exact opposite ;-) The logic being that this is not going to be visible in macro benchmarks, so we might as well centralise all of our XOR implementations in a single place.
I saw that, but then the question is whether you want all users to pull in a dependency with five asm implementations that are actually slower than a simple loop.
This looks like a prime example of premature optimization to me.
I support @jech proposal to centralize the implementation in the transport package.
I really don't care either way — either this PR or #116 are perfectly fine with me. On the other hand, I'd like to see one or the other applied, since I'm uncomfortable with the code duplication that's happening in the current code.
I've reviewed and merged #116. Hence this PR is obsolete.