nvidia-cuda-tutorial
nvidia-cuda-tutorial copied to clipboard
Nvidia contributed CUDA tutorial for Numba
Failing to implement these results in weird behaviour for parameterised types: - `__hash__` is required for correct interning. - `__eq__` is required to determine if casts are required.
Some parts of the course use kernel definitions from the `definition` and `definitions` properties. These properties are deprecated, and should be replaced with the use of `overloads` instead.
The section on the widening on integer indices produced in a loop over a `range` seems to accidentally be missing - it should be just before the "Limiting register usage"...
[Grid groups and grid sync](https://numba.readthedocs.io/en/latest/cuda/cooperative_groups.html) were added in Numba 0.53.1. A short section on using these to implement a global barrier would be good, perhaps based around the example kernel...