constantine icon indicating copy to clipboard operation
constantine copied to clipboard

precomp sqrt optimization

Open advaita-saha opened this issue 6 months ago • 9 comments

Optimization of square roots Tonelli-Shanks, with pre-computed dlog tables

Fixes #236

Notes about the approach :

  • https://ihagopian.com/posts/tonelli-shanks-with-precomputed-dlog-tables
  • https://hackmd.io/@jsign/bandersnatch-optimized-sqrt-notes

Reference Implementation from Gottfried in gnark https://github.com/GottfriedHerold/Bandersnatch/blob/f665f90b64892b9c4c89cff3219e70456bb431e5/bandersnatch/fieldElements/field_element_square_root.go

Currently pre-computes are added for Bandersnatch & Banderwagon

advaita-saha avatar Feb 02 '24 17:02 advaita-saha

Some more comments: I think we can totally remove the if checks inside the algorithm.

  1. This facilitates constant time implementation
  2. checking if the number is a square is easy once we have computed the candidate sqrt, we can just multiply it by itself, which is somewhat cheap, and it is what we do: https://github.com/mratsim/constantine/blob/5894a8da757bb935b681c73d1fdc92a7838f94ed/constantine/math/arithmetic/finite_fields_square_root.nim#L244-L257
  3. The case that matters and does NOT need constant-time is deserialization, and wrong deserialization means some protocol is not followed properly which means we need to stop interacting with the other party (if network) or the database (if local). This is useful for batch deserialization to shave 10~100 microseconds over thousands of points, but failing earlier after 5us instead of 10us is not necessary. (even for a DOS we're not in the milliseconds range here)

mratsim avatar Feb 03 '24 14:02 mratsim

I can confirm a 50% perf improvement on my machine :fire: 🎉

image

mratsim avatar Feb 03 '24 14:02 mratsim

I will be solving the constant time issues I wanted to make sure first that if the optimisation is working as expected

advaita-saha avatar Feb 04 '24 07:02 advaita-saha

Strangely I can't find the sqrt or deserialization benches in go-ipa.

mratsim avatar Feb 04 '24 09:02 mratsim

Strangely I can't find the sqrt or deserialization benches in go-ipa.

There isn't, I am pushing in a few hours Only bench that ignacio did on my request was of serialisation https://github.com/crate-crypto/go-ipa/pull/62

advaita-saha avatar Feb 04 '24 09:02 advaita-saha

Benchmarking code for go-ipa https://github.com/advaita-saha/go-ipa/blob/15686fb2ed7bf62b2c9883b17c77e3de1d09be0a/banderwagon/element_test.go#L393-L409

advaita-saha avatar Feb 04 '24 09:02 advaita-saha

I will be solving the constant time issues

I wanted to make sure first that if the optimisation is working as expected

Also if it's necessary I can give a benchmark on M2 Pro by coming Monday/Tuesday. Just to follow up on that there's no significant difference after performing precomp sqrt optimisation, moreover even the unoptimised and optimised are significantly slower on the M2 Pro chip with ARM arch

agnxsh avatar Feb 04 '24 10:02 agnxsh

Current Benchmarks Tested in AWS instance with AMD EPYC 7R13 Processor

Screenshot 2024-02-07 at 1 56 16 AM

Results

w.r.t go-ipa ~23% performance improvement 🔥 w.r.t constantine(prev-impl) ~50% performance improvement 🚀

advaita-saha avatar Feb 06 '24 20:02 advaita-saha

Confirmed on my machine:

go-ipa: image

Constantine: image

mratsim avatar Feb 10 '24 13:02 mratsim

It probably would be cleaner to rebase on top of master instead of merging master into this branch to keep a linear history.

mratsim avatar Feb 11 '24 17:02 mratsim

#Also @GottfriedHerold, it seems like your code has no license, can you clarify whether a direct port in Constantine (license MIT+Apache) is OK.

Yes, that's OK.

GottfriedHerold avatar Feb 11 '24 20:02 GottfriedHerold

@mratsim completed the suggested changes

advaita-saha avatar Feb 12 '24 12:02 advaita-saha