cuda-fixnum issues

Sorry, cudaMallocManaged() is not implemented in the current version.

1

~/cuda-fixnum# ./bench/bench Function: mul_lo, #elts: 1e3 fixnum digit total data time Kops/s bits bits (MiB) (seconds) 2021-08-01 09:49:54 [ERROR] Sorry, cudaMallocManaged() is not implemented in the current version. If user...

guiming-shi

Rewrite test suite to run via Python interface

1

Currently running through GTest, but it would be cleaner to run it via the Python interface developed in #59.

unzvfu

Implement faster Newton-Raphson

1

The initial estimate can be improved with a degree two approximation; also the iteration can be modified to cube the error, rather than simply square it. See [here](https://en.wikipedia.org/wiki/Division_algorithm#Variant_Newton-Raphson_division).

unzvfu

Investigate specialised mulmod for base*cuml in modexp

1

In the modular exponentiation algorithms, there are repeated mulmods of the form `A

unzvfu

Re-jigger the API to ease writing HLL interfaces

1

The current C++ API does not make it especially easy to write a high-level language interface. Change it so that it is. Codependent with #59.

unzvfu

Implement Python interface

1

Implement a Python interface based on the results of the analysis in #58. Codependent with #60.

unzvfu

Allow checking for overflow without penalising fast path

1

Currently, to assert that addition doesn't overflow, we ``` add_cy(s, cy, a, b); assert(digit::is_zero(cy)); ``` If `NDEBUG` is not set, then this checks that overflow hasn't occurred. If `NDEBUG` is...

unzvfu

Overhaul benchmarking system

1

Including, in no particular order - [ ] Generate data and graph it, rather than generating large difficult-to-interpret tables. - [ ] Generate data for all functions. - [ ]...

unzvfu

Use warp votes to branch on argument size to select fastest algo

1

Something like this for example: ``` if (__all(bits < digit::BITS)) algo_small_params(...); else algo_generic(...); ```

unzvfu

Work out why 32-bit digits is slower than 64-bit digits with same fixnum size

1

For example: ``` Function: mul_lo, #elts: 600e3 fixnum digit total data time Kops/s bits bits (MiB) (seconds) [...] 64 32 4.6 0.000 6122449.0 128 32 9.2 0.000 3592814.4 256 32...

unzvfu

cuda-fixnum
cuda-fixnum copied to clipboard

Metadata

Sorry, cudaMallocManaged() is not implemented in the current version.

Rewrite test suite to run via Python interface

Implement faster Newton-Raphson

Investigate specialised mulmod for base*cuml in modexp

Re-jigger the API to ease writing HLL interfaces

Implement Python interface

Allow checking for overflow without penalising fast path

Overhaul benchmarking system

Use warp votes to branch on argument size to select fastest algo

Work out why 32-bit digits is slower than 64-bit digits with same fixnum size

← Metadata

Owner

Metadata

cuda-fixnum cuda-fixnum copied to clipboard

Metadata

← Metadata

Owner

Metadata

cuda-fixnum
cuda-fixnum copied to clipboard