constantine icon indicating copy to clipboard operation
constantine copied to clipboard

Vectorized table select

Open mratsim opened this issue 5 years ago • 0 comments

The CMOV instruction that is used for conditional copy is likely optimal for 4~6 limbs.

From Agner Fog tables

image https://www.agner.org/optimize/instruction_tables.pdf

The throughput is 0.5 hence 2 independent CMOV can be issued per cycle, hence 2-3 cycles are required per Fp element.

However when we have a table precomputed for scalar multiplication/signing with 8 EC elements, each composed of 3 Fp coordinates of 4-6 limbs, using SSE or AVX we can load 2x4 or 2x8 limbs per cycle (2 vector loads per cycle, bottlenecked by memory speed).

This would reduce the overhead of table access. Note that LSB set recoding (#73) uses table with 64 to 256 EC elements (192+ Fp hence thousands of limbs)

i.e. to vectorize: https://github.com/mratsim/constantine/blob/00ff59910618d683c96b5bd4ec3972ba92990ce1/constantine/elliptic/ec_endomorphism_accel.nim#L200-L206

mratsim avatar Aug 24 '20 20:08 mratsim