constantine
constantine copied to clipboard
precomp sqrt optimization
Optimization of square roots Tonelli-Shanks, with pre-computed dlog tables
Fixes #236
Notes about the approach :
- https://ihagopian.com/posts/tonelli-shanks-with-precomputed-dlog-tables
- https://hackmd.io/@jsign/bandersnatch-optimized-sqrt-notes
Reference Implementation from Gottfried in gnark
https://github.com/GottfriedHerold/Bandersnatch/blob/f665f90b64892b9c4c89cff3219e70456bb431e5/bandersnatch/fieldElements/field_element_square_root.go
Currently pre-computes are added for Bandersnatch & Banderwagon
Some more comments: I think we can totally remove the if
checks inside the algorithm.
- This facilitates constant time implementation
- checking if the number is a square is easy once we have computed the candidate
sqrt
, we can just multiply it by itself, which is somewhat cheap, and it is what we do: https://github.com/mratsim/constantine/blob/5894a8da757bb935b681c73d1fdc92a7838f94ed/constantine/math/arithmetic/finite_fields_square_root.nim#L244-L257 - The case that matters and does NOT need constant-time is deserialization, and wrong deserialization means some protocol is not followed properly which means we need to stop interacting with the other party (if network) or the database (if local). This is useful for batch deserialization to shave 10~100 microseconds over thousands of points, but failing earlier after 5us instead of 10us is not necessary. (even for a DOS we're not in the milliseconds range here)
I can confirm a 50% perf improvement on my machine :fire: 🎉
I will be solving the constant time issues I wanted to make sure first that if the optimisation is working as expected
Strangely I can't find the sqrt or deserialization benches in go-ipa.
Strangely I can't find the sqrt or deserialization benches in go-ipa.
There isn't, I am pushing in a few hours Only bench that ignacio did on my request was of serialisation https://github.com/crate-crypto/go-ipa/pull/62
Benchmarking code for go-ipa
https://github.com/advaita-saha/go-ipa/blob/15686fb2ed7bf62b2c9883b17c77e3de1d09be0a/banderwagon/element_test.go#L393-L409
I will be solving the constant time issues
I wanted to make sure first that if the optimisation is working as expected
Also if it's necessary I can give a benchmark on M2 Pro by coming Monday/Tuesday. Just to follow up on that there's no significant difference after performing precomp sqrt optimisation, moreover even the unoptimised and optimised are significantly slower on the M2 Pro chip with ARM arch
Current Benchmarks Tested in AWS instance with AMD EPYC 7R13 Processor
Results
w.r.t go-ipa
~23% performance improvement 🔥
w.r.t constantine(prev-impl)
~50% performance improvement 🚀
Confirmed on my machine:
go-ipa:
Constantine:
It probably would be cleaner to rebase on top of master instead of merging master into this branch to keep a linear history.
#Also @GottfriedHerold, it seems like your code has no license, can you clarify whether a direct port in Constantine (license MIT+Apache) is OK.
Yes, that's OK.
@mratsim completed the suggested changes