cuda-fixnum icon indicating copy to clipboard operation
cuda-fixnum copied to clipboard

Extended-precision modular arithmetic library that targets CUDA.

Results 38 cuda-fixnum issues
Sort by recently updated
recently updated
newest added

Guidance is provided [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#independent-thread-scheduling-7-x).

- [Accelerate](https://github.com/AccelerateHS/accelerate) - [Accelerate-bignum](https://github.com/tmcdonell/accelerate-bignum) - [Λ∘λ-accelerate](https://github.com/tmcdonell/Lol-accelerate)

When running the test suite, `modexp` (CLNW) seems faster than `multi_modexp` (k-ary) (at least in the 128 & 256 byte range), though this doesn't really make sense, since CLNW branches...

According to Koç, VLNW beats CLNW by a few percent, though I don't see how. (Note that in their respective sliding-window sections, MCA describes CLNW and HAC describes VLNW.)

At the moment the exponent window array is `malloc`ated once _per slot_ (see [modexp::modexp(...)](https://github.com/n1analytics/cuda-fixnum/blob/a5fc611f50f728e696971a86852df0ddb51b7f01/src/functions/modexp.cu#L103)), whereas it doesn't make a lot of sense to use the function unless all the exponents...

Basically they are primitives for the operation `A * B + C` on small matrices. See [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#wmma).

See [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#cooperative-groups). Currently we simulate this with the slot layout template.

At the very least this involves considering side-channel attacks on GPUs, but also data recovery from shared access to GPUs on providers like AWS. Some research in this area: -...

Implementations are largely done in [quorem.cu](cuda-fixnum/src/functions/quorem.cu) and [quorem_preinv.cu](cuda-fixnum/src/functions/quorem_preinv.cu), but they need significant refactoring and tidying.

Compiler is too dumb to do this apparently, causing a 3x slow-down on the `mul_lo` code. Not sure how to do it automatically using variadic parameter pack though...