Shao Tang
Shao Tang
Updated with a full parallel kernel.
> Isn't the `if` necessary for safety? Correct, too focused on the thread divergence and forgot about the basics : ) Updated the PR
Include a kernel taking padded probs.
1. upgrade nvcc to 12.4. 2. check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see ``` #if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) ||...