constantine
constantine copied to clipboard
Faster point decompression
cc @ec2.
Point decompression is a bottleneck in protocols (or to load trusted setups).
It's very slow due to square root.
Storing point uncompressed would be a memcopy instead (something like 20000x faster).
This is worse for non-sqrt friendly curves like BLS12-377 or Bandersnatch.
See:
- Tonelli-Shanks with precomputed dlog tables
https://hackmd.io/@jsign/bandersnatch-optimized-sqrt-notes https://ihagopian.com/posts/tonelli-shanks-with-precomputed-dlog-tables - Computing Square Roots Faster than the Tonelli-Shanks/Bernstein Algorithm
Palash Sarkar
https://eprint.iacr.org/2020/1407
Courtesy of @yelhousni
https://eprint.iacr.org/2023/828
- Optimized Discrete Logarithm Computation for Faster Square Roots in Finite Fields
Thomas Pornin, 2023
https://eprint.iacr.org/2023/828
For computing square roots in a finite field GF(q) where q-1 = 2ⁿm for an odd integer m and some integer n, the classic Tonelli-Shanks algorithm starts with an exponentiation (the exponent has size about log₂ q - n bits), followed by a discrete logarithm computation in the subgroup of 2ⁿ-th roots of unity in GF(q) ; the latter operation has cost O(n²) multiplications in the field, which is prohibitive when n is large. Bernstein proposed an optimized variant with lookup tables, leading to a runtime cost of O((n/w)²), using w-bit tables of cumulative size O(2ʷn). Sarkar recently improved on the runtime cost, down to O((n/w)^1.5), with the same overall storage cost. In this short note, we explore the use of a straightforward divide-and-conquer variant of the Pohlig-Hellman algorithm, bringing the asymptotic cost down to O(n log n), and further study some additional optimizations. The result appears to be competitive, at least in terms of number of multiplications, for some well-known fields such that the 224-bit field used in NIST standard elliptic curve P-224 (for which n=96).