Cytnx
Cytnx copied to clipboard
CUDA streams are missing
The GPU kernels need a mechanism to set and manage the cuda Stream. This is essential to enable parallelism on the GPU.
This either can do using stream or just directly using threads. CUDA now give new default stream per thread instead of share same default stream.