go-ethereum
go-ethereum copied to clipboard
Implement binary EEA inversion for faster BNADD precompile
Single field inversion takes almost all the time of execution in case of point addition. Switching from inversion by exponentiation into p-2 power to binary EEA inversion reduces execution time in 3 times
Before change:
BenchmarkG1AddAndMakeAffine-12 85004 13955 ns/op 256 B/op 5 allocs/op
After change:
BenchmarkG1AddAndMakeAffine-12 261412 4593 ns/op 256 B/op 5 allocs/op
Oooh, this sounds nice. Would it make sense to upstream it to Cloudflare's original library? https://github.com/cloudflare/bn256
Edit: I've opened an upstream issue referencing this https://github.com/cloudflare/bn256/issues/23.
This is not constant time.
I’m not sure if Cloudflare needs it constant time, but since element is not zero and inverse always exists then such inversion will finish in log(p) steps (for Eth purposes to be bounded computation).
N.B. Original EIP for this functionality does not give constant time guarantees anyway
EEA is not the recommeded way to perform inversion, because its execution is not performed in constant time with respect to the value of its input. EEA is well-known to be the source of several side-channel attacks, see for example ia.cr/2020/432.
Regarding performance, point operations can be performed in projective coordinates, and at the very end make just one conversion.
As was mentioned above there is no need to constant time for the precompile. And addition is indeed performed in projective coordinates, but Ethereum ABI requires outputs to be in affine coordinates, so fast inversion is important for single addition performance.
This change seems to be specific to the application, and it's likely not to be included in cloudflare/bn256.
You could add a method called MakeAffineInsecure which uses EEA to disinguish from the original MakeAffine method.
Had to do some duplication of code (to avoid global config variables), but now there are two flavors of 'constant time' and 'variable time' available functions. Fp6/Fp12 inversions do not benefit from variable time speedup much, but I've continued with the same approach everywhere. Miller loop and final exponentiations are left with constant time cause speedup there is also negligible.
Before and after:
name old time/op new time/op delta
PrecompiledBn256Add/chfast1-Gas=150-6 13.6µs ± 1% 5.1µs ± 1% -62.24% (p=0.016 n=5+4)
PrecompiledBn256Add/chfast2-Gas=150-6 13.6µs ± 2% 5.2µs ± 1% -61.84% (p=0.016 n=5+4)
PrecompiledBn256Add/cdetrio1-Gas=150-6 1.02µs ± 2% 1.01µs ± 1% ~ (p=0.286 n=5+5)
PrecompiledBn256Add/cdetrio2-Gas=150-6 1.05µs ± 1% 1.10µs ±20% ~ (p=0.952 n=5+5)
PrecompiledBn256Add/cdetrio3-Gas=150-6 1.06µs ± 2% 1.05µs ± 4% ~ (p=0.690 n=5+5)
PrecompiledBn256Add/cdetrio4-Gas=150-6 1.07µs ± 0% 1.10µs ± 2% +2.21% (p=0.029 n=4+4)
PrecompiledBn256Add/cdetrio5-Gas=150-6 1.04µs ± 3% 1.03µs ± 4% ~ (p=0.246 n=5+5)
PrecompiledBn256Add/cdetrio6-Gas=150-6 1.36µs ± 2% 1.45µs ±18% ~ (p=0.222 n=5+5)
PrecompiledBn256Add/cdetrio7-Gas=150-6 1.37µs ± 2% 1.44µs ±26% ~ (p=0.841 n=5+5)
PrecompiledBn256Add/cdetrio8-Gas=150-6 1.41µs ± 1% 1.56µs ±18% ~ (p=0.063 n=5+5)
PrecompiledBn256Add/cdetrio9-Gas=150-6 1.35µs ± 2% 1.38µs ± 1% +2.19% (p=0.032 n=5+4)
PrecompiledBn256Add/cdetrio10-Gas=150-6 1.37µs ± 7% 1.47µs ±23% ~ (p=0.151 n=5+5)
PrecompiledBn256Add/cdetrio11-Gas=150-6 13.9µs ± 1% 5.6µs ± 1% -59.74% (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio12-Gas=150-6 13.8µs ± 1% 5.8µs ±13% -57.82% (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio13-Gas=150-6 13.7µs ± 1% 5.4µs ± 4% -60.20% (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio14-Gas=150-6 2.05µs ± 2% 2.13µs ± 5% ~ (p=0.056 n=5+5)
It does a real difference in many cases (not all) - particularly these five:
PrecompiledBn256Add/chfast1-Gas=150-6 13.6µs ± 1% 5.1µs ± 1% -62.24% (p=0.016 n=5+4)
PrecompiledBn256Add/chfast2-Gas=150-6 13.6µs ± 2% 5.2µs ± 1% -61.84% (p=0.016 n=5+4)
PrecompiledBn256Add/cdetrio11-Gas=150-6 13.9µs ± 1% 5.6µs ± 1% -59.74% (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio12-Gas=150-6 13.8µs ± 1% 5.8µs ±13% -57.82% (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio13-Gas=150-6 13.7µs ± 1% 5.4µs ± 4% -60.20% (p=0.008 n=5+5)
However, the particular cases where it does make a difference, are the cases which are currently the slowest:
namely, these five:
name old mgas/s
PrecompiledBn256Add/chfast1-Gas=150-6 11.0 ± 1%
PrecompiledBn256Add/chfast2-Gas=150-6 11.0 ± 2%
PrecompiledBn256Add/cdetrio11-Gas=150-6 10.8 ± 1%
PrecompiledBn256Add/cdetrio12-Gas=150-6 10.9 ± 1%
PrecompiledBn256Add/cdetrio13-Gas=150-6 11.0 ± 1%
Which are an order of magnitude worse than others:
PrecompiledBn256Add/cdetrio1-Gas=150-6 147 ± 1%
PrecompiledBn256Add/cdetrio2-Gas=150-6 142 ± 1%
PrecompiledBn256Add/cdetrio3-Gas=150-6 142 ± 2%
PrecompiledBn256Add/cdetrio4-Gas=150-6 140 ± 0%
PrecompiledBn256Add/cdetrio5-Gas=150-6 144 ± 3%
PrecompiledBn256Add/cdetrio6-Gas=150-6 110 ± 2%
PrecompiledBn256Add/cdetrio7-Gas=150-6 110 ± 2%
PrecompiledBn256Add/cdetrio8-Gas=150-6 107 ± 2%
PrecompiledBn256Add/cdetrio9-Gas=150-6 111 ± 2%
PrecompiledBn256Add/cdetrio10-Gas=150-6 110 ± 7%
PrecompiledBn256Add/cdetrio14-Gas=150-6 73.3 ± 2%
As stated previously, constant time is not something we're bothered by. As long as this is mathematically correct, and doesn't suffer from a different set of worst-cases that we're as of yet unaware of, I'm all for merging this.
Constant time must be the default.
We geth-maintainers aren't cryptographers, and it's difficult for us to validate the proposed changes, other than that we can definitely agree that for us, constant time at the cost of speed, is not desireable.
However, I'd highly appreciate any input from @bwesterb or @armfazh regarding:
- Is the change correct, as in, assuming the implementation is correct, does the math hold?
- Would this lead to a new (even worse) set of worst-cases?
I don't expect you to spend a lot of time auditing our code, but if you have already looked at it, perhaps you already have formed opinions about these questions.
I created some fuzzers for this change here: https://github.com/MariusVanDerWijden/go-ethereum/tree/var_time_fuzzer Been running them for some time now and they haven't found a difference between the varTime and the constant time implementations.
@armfazh I think all your concerns has been addressed - none of the variable-time functions are on default paths. Do you think this can/will be merged upstream?
@armfazh I think all your concerns has been addressed - none of the variable-time functions are on default paths. Do you think this can/will be merged upstream?
could anyone of you prepare a PR against upstream, so we can do a final review and merge it then,
I started prepping an upstream PR, but for some reason the test fails. @shamatar do you have any idea why? I just copy-pasted the changes... https://github.com/holiman/bn256/tree/binary_eea_inversion
=== RUN TestBinaryEAA
TestBinaryEAA: bn256_test.go:232: results of different inversion do not agree
I'll have a look at the morning, sure.
I've identified a problem: modulus size in the original repo is 256 bits, so the trick with addNocarry and then right shift by 1 bit will not work cause addNocarry actually produces carry. In Ethereum the curve is different and modulus is 254 bits. I can fix it in a free time, but not sure about the time yet.
@shamatar friendly ping, was this ever looked at?
Hey @axic! I didn't, but the solution is quite simple, so since you have reminded me I should make a direct PR to the Cloudflare's repo, but for Ethereum the BN curve is different than in the original CF repo, so the trick with addNocarry is perfectly safe for Ethereum curve (adding two 254 bit numbers can not overflow 256 bit buffer) and can be used as-is