nvidia-cuda-tutorial icon indicating copy to clipboard operation
nvidia-cuda-tutorial copied to clipboard

Nvidia contributed CUDA tutorial for Numba

Results 4 nvidia-cuda-tutorial issues
Sort by recently updated
recently updated
newest added

Failing to implement these results in weird behaviour for parameterised types: - `__hash__` is required for correct interning. - `__eq__` is required to determine if casts are required.

Some parts of the course use kernel definitions from the `definition` and `definitions` properties. These properties are deprecated, and should be replaced with the use of `overloads` instead.

The section on the widening on integer indices produced in a loop over a `range` seems to accidentally be missing - it should be just before the "Limiting register usage"...

[Grid groups and grid sync](https://numba.readthedocs.io/en/latest/cuda/cooperative_groups.html) were added in Numba 0.53.1. A short section on using these to implement a global barrier would be good, perhaps based around the example kernel...