stun icon indicating copy to clipboard operation
stun copied to clipboard

Speed up XORMappedAddress by inlining xorBytes

Open greatroar opened this issue 2 years ago • 2 comments

Description

Since xorBytes is only ever called for very small slices, the expensive part of it is the function call. Benchmark results on linux/amd64:

name                        old time/op    new time/op    delta
XORMappedAddress_AddTo-8      43.5ns ±12%    33.1ns ± 1%  -23.91%  (p=0.000 n=10+9)
XORMappedAddress_GetFrom-8    26.2ns ± 7%    19.6ns ±20%  -25.36%  (p=0.000 n=10+9)

Reference issue

Closes #116.

greatroar avatar Jun 11 '22 12:06 greatroar

See also #116, which does the exact opposite ;-) The logic being that this is not going to be visible in macro benchmarks, so we might as well centralise all of our XOR implementations in a single place.

jech avatar Jul 11 '22 21:07 jech

I saw that, but then the question is whether you want all users to pull in a dependency with five asm implementations that are actually slower than a simple loop.

greatroar avatar Jul 30 '22 09:07 greatroar

This looks like a prime example of premature optimization to me.

I support @jech proposal to centralize the implementation in the transport package.

stv0g avatar Nov 12 '22 15:11 stv0g

I really don't care either way ­— either this PR or #116 are perfectly fine with me. On the other hand, I'd like to see one or the other applied, since I'm uncomfortable with the code duplication that's happening in the current code.

jech avatar Nov 12 '22 17:11 jech

I've reviewed and merged #116. Hence this PR is obsolete.

stv0g avatar Nov 12 '22 17:11 stv0g