algebra
algebra copied to clipboard
Can arm64 also be optimized for field arithmetic operation?
I am developing a program using arkworks' groth16 snark library. Proof verification benchmark performance results were sufficiently fast in the local development environment, but very slow performance results were observed in the embedded environment.
As a result of using the profiler tool, most of the overhead occurred in ark-ff
's field arithmetic operation (mul_assign
).
The current arithmetic implementation of ark-ff
appears to have inline assembly optimization only for x86_64
.
The embedded environment uses arm64
architecture and has low-performance hardware such as Raspberry Pi. Of course, low hardware performance is the main cause, but considering the generic mobile environment, I think arm64
optimization support is necessary.
I would like to ask if you have any plans to support the arm64 arithmetic optimization.
I found related docs as following, and I maybe understand why montgomery optimization based on arm64 ISA is difficult. Unless the same instructions are provided as Intel ADX and BMI2, assembly optimization probably won't help much.
However, I don't know if this conclusion is accurate due to my limited knowledge of cryptography, so please close the issue if there are no other opinions after the final review.
https://research.nccgroup.com/2021/09/10/optimizing-pairing-based-cryptography-montgomery-multiplication-in-assembly/