stun Speed up XORMappedAddress by inlining xorBytes

Speed up XORMappedAddress by inlining xorBytes

Open greatroar opened this issue 2 years ago • 2 comments

Description

Since xorBytes is only ever called for very small slices, the expensive part of it is the function call. Benchmark results on linux/amd64:

name                        old time/op    new time/op    delta
XORMappedAddress_AddTo-8      43.5ns ±12%    33.1ns ± 1%  -23.91%  (p=0.000 n=10+9)
XORMappedAddress_GetFrom-8    26.2ns ± 7%    19.6ns ±20%  -25.36%  (p=0.000 n=10+9)

Reference issue

Closes #116.

Jun 11 '22 12:06 greatroar

See also #116, which does the exact opposite ;-) The logic being that this is not going to be visible in macro benchmarks, so we might as well centralise all of our XOR implementations in a single place.

Jul 11 '22 21:07 jech

I saw that, but then the question is whether you want all users to pull in a dependency with five asm implementations that are actually slower than a simple loop.

Jul 30 '22 09:07 greatroar

This looks like a prime example of premature optimization to me.

I support @jech proposal to centralize the implementation in the transport package.

Nov 12 '22 15:11 stv0g

I really don't care either way — either this PR or #116 are perfectly fine with me. On the other hand, I'd like to see one or the other applied, since I'm uncomfortable with the code duplication that's happening in the current code.

Nov 12 '22 17:11 jech

I've reviewed and merged #116. Hence this PR is obsolete.

Nov 12 '22 17:11 stv0g

stun stun copied to clipboard

Speed up XORMappedAddress by inlining xorBytes

Description

Reference issue

stun
stun copied to clipboard