perf: Consider getting rid of `field/uint128`

Open Yawning opened this issue 4 years ago • 1 comments

I'm not sure how bad the compiler behavior is on non-amd64 (due to lack of access to targets), and this only impacts non-amd64/arm64 (due to dedicated assembly), but https://github.com/golang/go/issues/29571 is (also) costing you a good amount of performance.

Unfortunately having giant walls of math/bits calls is less readable that the wrapper type, so depending on how you want to balance "readability" vs "going fast" this might be ok.

name        old time/op  new time/op  delta
Add-4       24.1ns ± 0%  24.1ns ± 0%  +0.17%
Multiply-4   135ns ± 0%   127ns ± 0%  -6.13%
Mult32-4    26.1ns ± 0%  26.1ns ± 0%  +0.11%
Square-4     102ns ± 0%    96ns ± 0%  -5.84%
Invert-4    27.3µs ± 0%  25.8µs ± 0%  -5.71%

name                           old time/op  new time/op  delta
MultiScalarMultSize8-4         1.23ms ± 0%  1.17ms ± 0%  -4.26%
ScalarBaseMult-4                101µs ± 0%    96µs ± 0%  -4.45%
ScalarMult-4                    360µs ± 0%   342µs ± 0%  -4.90%
VarTimeDoubleScalarBaseMult-4   351µs ± 0%   332µs ± 0%  -5.46%

nb: Only did one iteration on an amd64 target with go 1.17beta1 + purego, so there's some noise in the comparison, but the difference is statistically significant and noticeable.

Aug 03 '21 15:08 Yawning

Hmm, this is a hard one, thank you for trying it out. I value the readability of those functions a lot, for both maintainability and education purposes. I'll make a PR, but probably won't merge it and will instead use it to push golang/go#29571. Hopefully by Go 1.18 it won't be a problem anymore.

this only impacts non-amd64/arm64 (due to dedicated assembly)

Note that the arm64 assembly is just a tiny carryPropagate core, not the full Square and Multiply.

Aug 09 '21 09:08 FiloSottile