cuda-fixnum icon indicating copy to clipboard operation
cuda-fixnum copied to clipboard

Extended-precision modular arithmetic library that targets CUDA.

Results 38 cuda-fixnum issues
Sort by recently updated
recently updated
newest added

~/cuda-fixnum# ./bench/bench Function: mul_lo, #elts: 1e3 fixnum digit total data time Kops/s bits bits (MiB) (seconds) 2021-08-01 09:49:54 [ERROR] Sorry, cudaMallocManaged() is not implemented in the current version. If user...

Currently running through GTest, but it would be cleaner to run it via the Python interface developed in #59.

The initial estimate can be improved with a degree two approximation; also the iteration can be modified to cube the error, rather than simply square it. See [here](https://en.wikipedia.org/wiki/Division_algorithm#Variant_Newton-Raphson_division).

In the modular exponentiation algorithms, there are repeated mulmods of the form `A

The current C++ API does not make it especially easy to write a high-level language interface. Change it so that it is. Codependent with #59.

Implement a Python interface based on the results of the analysis in #58. Codependent with #60.

Currently, to assert that addition doesn't overflow, we ``` add_cy(s, cy, a, b); assert(digit::is_zero(cy)); ``` If `NDEBUG` is not set, then this checks that overflow hasn't occurred. If `NDEBUG` is...

Including, in no particular order - [ ] Generate data and graph it, rather than generating large difficult-to-interpret tables. - [ ] Generate data for all functions. - [ ]...

Something like this for example: ``` if (__all(bits < digit::BITS)) algo_small_params(...); else algo_generic(...); ```

For example: ``` Function: mul_lo, #elts: 600e3 fixnum digit total data time Kops/s bits bits (MiB) (seconds) [...] 64 32 4.6 0.000 6122449.0 128 32 9.2 0.000 3592814.4 256 32...