cuda-fixnum issues

It will relieve register pressure. Allows easily sharing the data between warps. Any other benefits? Could cause bank conflicts when sharing a single word between multiple threads; to investigate.

unzvfu

Investigate use of other PTX instructions in arithmetic implementations

2

Potentially useful instructions include - min and max without branching - sum of absolute differences: `sad.u32` - [funnel shift](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#logic-and-shift-instructions-shf)

unzvfu

Shuffle functions shouldn't be used directly in kernels

1

Seems like a leaky abstraction. For example, probably want right/left shift & rotation functions in fixnum implementations that are defined in terms of the slot_layout shuffles.

unzvfu

Specialise multi_modexp to case where all exponents are small

1

There is an implementation of this in the `attic` that definitely performs better than the full version. It would be nicer still to have a single version that works well...

unzvfu

Ensure that device-side assertions are disabled in non-debug builds

1

`assert`s in device code are surprisingly (i) compiled in even without debugging flags and (ii) somewhat expensive. They should be kept in for debug builds only.

unzvfu

Implement support for even moduli

2

Montgomery reduction requires an odd modulus, and that's the only one that's implemented at present. Even moduli could be handled by having special code for moduli that are powers of...

unzvfu

cuda-fixnum
cuda-fixnum copied to clipboard

Metadata

Implement bignum arithmetic in terms of fixnum arithmetic

Speed up test case generation using multiprocessing

Store test cases in compressed format

Investigate using Montgomery multiplication in the division algorithms

Consider storing precomputed values in shared memory

Investigate use of other PTX instructions in arithmetic implementations

Shuffle functions shouldn't be used directly in kernels

Specialise multi_modexp to case where all exponents are small

Ensure that device-side assertions are disabled in non-debug builds

Implement support for even moduli

← Metadata

Owner

Metadata

cuda-fixnum cuda-fixnum copied to clipboard

Metadata

← Metadata

Owner

Metadata

cuda-fixnum
cuda-fixnum copied to clipboard