Leo Fang issues

Results 278 issues of


                                            Leo Fang

CUDA graph phase N - Support child graphs

triage

feature

cuda.core

CUDA graph phase N - CPU callbacks & user objects

The fun part would be: How to keep a generic Python object alive? https://docs.nvidia.com/cuda/cuda-programming-guide/04-special-topics/cuda-graphs.html#cuda-user-objects

triage

feature

cuda.core

CI: The release workflow should check if the versioned release note is missing

For example, we released `cuda-bindings` and `cuda-python` 13.1.0 yesterday, but we did not add `13.1.0-notes.rst` to https://github.com/NVIDIA/cuda-python/tree/main/cuda_python/docs/source/release.

bug

triage

CI/CD

`cuda.core.Program`: Support Tile IR

Currently this is low priority because there is no such thing like "libtile", only `tileiras` which is an executable. We prefer in-process compilation through compiler libraries over subprocess calls to...

feature

cuda.core

blocked

Update `pyproject.toml` for `uv`

Capturing feedbacks provided by @xiakun-lu offline. The NCCL team noticed that `uv sync` complains `nccl4py[cu12]` and `nccl4py[cu13]` are incompatible (`uv venv && uv pip install -e .` works out of...

support

triage

cuda.bindings

cuda.core

CUDA graph phase N - explicit graph construction

Instead of relying on stream capturing, which is considered an implementation detail (that in the future we could allow users to opt in or out), our graph builder APIs were...

enhancement

triage

cuda.core

Test `numba_debug` in the libNVVM path too?

Follow-up of https://github.com/NVIDIA/cuda-python/pull/1216. We currently test the NVRTC path, but `cuda.core.Program` also covers libNVVM (and nvJitLink!) and we should get them tested too.

triage

test

cuda.core

Add a "tips & tricks" page

- many cuda.core operations do not actually need an active CUDA context (aka a device that is set to current) - some just need CUDA to be initialized, meaning `cuInit(0)`...

documentation

triage

cuda.core

Add a "known issues & limitations" page

- cuda.core does not allow any side calls to `cudaDeviceReset` or alike that tear down the primary contexts - avoid multiple frees of the same buffer in the child process...

documentation

triage

cuda.core

Support converting arbitrary objects to `StridedMemoryView` in `cuda.core.launch()`

enhancement

triage

cuda.core