icicle
icicle copied to clipboard
a GPU Library for Zero-Knowledge Acceleration
## Description Add Cuda graph support ## Motivation GPU utilization and performance improvement
Currently using pageable memory using cudaMalloc. Switching to pinned host memory using cudaMallocHost will perform better with higher bandwidth.
Curious what the rational for the window size of c = 10-bit scalars (26 windows) is? It feels like this choice can be optimized?
The for loop in this kernel can be eliminated with the integration of **cooperative groups**. Instead of single thread looping over all the limbs for a single scalar, multiple threads...
## Description Currently we don't have the functionality to open committed polynomials at arbitrary points. ## Motivation In Danksharding, KZG openings are performed at specific carefully chosen points, which makes...
## Description Please provide a clear and concise description of the feature you would like included. ## Motivation Please provide a clear and concise description of the motivation for adding...
## Description The current [msm.cu](https://github.com/ingonyama-zk/icicle/blob/main/icicle/appUtils/msm/msm.cu) implementation launches a monolithic kernel in several places like this: ```cuda unsigned NUM_THREADS = 1
## Description Please provide a clear and concise description of the feature you would like included. ## Motivation Please provide a clear and concise description of the motivation for adding...
1. https://github.com/ingonyama-zk/icicle/blob/ef8beb8d0c7d481b8f3f3a47d48695889e4079b3/icicle/appUtils/ntt/ntt.cuh#L323 2. https://github.com/ingonyama-zk/icicle/blob/ef8beb8d0c7d481b8f3f3a47d48695889e4079b3/icicle/appUtils/ntt/ntt.cuh#L199 3.https://github.com/ingonyama-zk/icicle/blob/ef8beb8d0c7d481b8f3f3a47d48695889e4079b3/icicle/appUtils/ntt/ntt.cuh#L338
## Description Aleo created their version of Marlin called [Varuna](https://drive.google.com/file/d/1W9vsn5xT1vUmJbzO8VXoNS4W1wGWLDHN/view) which is (as far as I understand) what is ran when https://github.com/AleoHQ/snarkVM/blob/testnet3/synthesizer/src/vm/execute.rs#L26 is executed. Because the time outside witness generation...