Leo Fang
Leo Fang
The fun part would be: How to keep a generic Python object alive? https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/cuda-graphs.html#cuda-user-objects
For example, we released `cuda-bindings` and `cuda-python` 13.1.0 yesterday, but we did not add `13.1.0-notes.rst` to https://github.com/NVIDIA/cuda-python/tree/main/cuda_python/docs/source/release.
Currently this is low priority because there is no such thing like "libtile", only `tileiras` which is an executable. We prefer in-process compilation through compiler libraries over subprocess calls to...
Capturing feedbacks provided by @xiakun-lu offline. The NCCL team noticed that `uv sync` complains `nccl4py[cu12]` and `nccl4py[cu13]` are incompatible (`uv venv && uv pip install -e .` works out of...
Instead of relying on stream capturing, which is considered an implementation detail (that in the future we could allow users to opt in or out), our graph builder APIs were...
Follow-up of https://github.com/NVIDIA/cuda-python/pull/1216. We currently test the NVRTC path, but `cuda.core.Program` also covers libNVVM (and nvJitLink!) and we should get them tested too.
- many cuda.core operations do not actually need an active CUDA context (aka a device that is set to current) - some just need CUDA to be initialized, meaning `cuInit(0)`...
- cuda.core does not allow any side calls to `cudaDeviceReset` or alike that tear down the primary contexts - avoid multiple frees of the same buffer in the child process...