Albin Ahlbäck
Albin Ahlbäck
> On my M2 Mac with the current implementation I get : > > ``` > #define N_GCDEXT_METHOD 1 /* 0.14% faster than 0 */ > #define N_MOD_VEC_ADD_METHOD 0 /*...
Perhaps it is better to try to utilize ```c __asm__ volatile ("mrs\t%0, CNTVCT_EL0" : "=r" (cc)); ``` as that is what [FFTW does](https://github.com/FFTW/fftw3/blob/master/kernel/cycle.h#L547). The granularity is pretty rough though (I...
Have you tried using native code for `nmod8`? It could be that the compiler is doing some weird.
Yes, but I think it can be nice to be even more specific and check after function calls. My thinking is that assertions should only be used to check (1)...
It would be nice to separate those that need normalization and those who does not. Not sure how to do that user friendly though.
Modified version of Algorithm 1 of [Thomas Pornin's paper](https://github.com/pornin/bingcd/blob/main/doc/bingcd.pdf)
> I really doubt that error message make up a significant fraction of the library size. Can we actually get a count? Not sure how to do that without turning...
Hmm, not all functions allow aliasing. Whenever there is a loop involved, I believe it does not support aliasing (or it may do up to some specific iteration).
Generally, I think it is a bad idea performance-wise to try to allow aliasing for low-level multiplication functions.
I don't recall if the Arm semi-hardcoded multiplication routines allow for aliasing, but I'm pretty sure there is ranges there where aliasing is not allowed.