flint icon indicating copy to clipboard operation
flint copied to clipboard

Assembly for Arm v8.5-A ISA

Open albinahlback opened this issue 1 year ago • 1 comments

I'm sure it has gotten the attention of everyone that Apple's M-chips are basically as fast as the state-of-the-art x86 processors (see GMP's benchmark results). Therefore, I think we should implement assembly routines for these ones as well.

These are the current routines that should be implemented:

  • [x] Hard(ish)coded multiplication (treated in #1808, works as a full replacement for mpn_mul_basecase)
  • [x] Hardcoded squaring (treated in #1912)
  • [x] Hardcoded high multiplication (treated in #1912)
  • [x] Hardcoded high squaring (treated in #1912)
  • [x] High multiplication, basecase (treated in #1912)
  • [ ] High squaring, basecase
  • [ ] Hardcoded low multiplication
  • [ ] Hardcoded low squaring
  • [ ] Low multiplication, basecase
  • [ ] Low squaring, basecase

Useful links:

  1. https://dougallj.github.io/applecpu/firestorm.html
  2. https://dougallj.github.io/applecpu/firestorm-int.html
  3. https://dougallj.github.io/applecpu/firestorm-simd.html
  4. https://developer.arm.com/architectures/instruction-sets/intrinsics/
  5. https://developer.arm.com/documentation/ddi0602/2023-12?lang=en
  6. https://github.com/corsix/amx
  7. https://stackoverflow.com/questions/70717360/how-to-load-vector-registers-from-integer-registers-in-arm64-m1

albinahlback avatar Feb 27 '24 10:02 albinahlback

Currently on my arm_assembly branch:

mpn_mul vs flint_mpn_mul

m =   1: 4.67
m =   2: 4.68 3.61
m =   3: 4.01 3.30 3.04
m =   4: 2.89 2.39 2.27 2.18
m =   5: 3.03 2.21 1.95 2.02 2.04
m =   6: 2.64 1.97 1.82 1.89 2.18 2.05
m =   7: 2.32 1.79 1.99 1.68 1.76 1.79 1.83
m =   8: 2.13 1.69 1.61 1.59 1.70 1.74 1.81 1.79
m =   9: 1.96 1.63 1.57 1.53 1.63 1.64 1.64 1.71 1.77
m =  10: 1.81 1.49 1.48 1.47 1.51 1.63 1.60 1.69 1.73 1.75
m =  11: 1.75 1.50 1.45 1.46 1.45 1.48 1.51 1.53 1.57 1.56 1.58
m =  12: 1.63 1.37 1.39 1.47 1.51 1.57 1.69 1.78 1.67 1.58 1.58 1.61

Tested on cfarm103 (Apple M1)

albinahlback avatar Mar 01 '24 09:03 albinahlback