Leo Fang

Results 1175 comments of Leo Fang

This one too: https://github.com/NVIDIA/cuda-python/blob/f9f67c8ba2698c56eede95fe1b55bca3ce26d9a4/cuda_bindings/cuda/bindings/runtime.pyx.in#L20967 Sphinx output: ``` docstring of cuda.bindings.runtime.cudaMemcpy:19: CRITICAL: Unexpected section title. Parameters ---------- [docutils] docstring of cuda.bindings.runtime.cudaMemcpy:30: CRITICAL: Unexpected section title. Returns ------- [docutils] docstring of cuda.bindings.runtime.cudaMemcpy:35:...

It seems we just need to add `LaunchConfig.cooperative_launch: bool = False`, and map it to `CU_LAUNCH_ATTRIBUTE_COOPERATIVE` when preparing the launch config for `cuLaunchKernelEx`. What would be the right way to...

To make it 100% clear: An internal dispatching to `cuLaunchCooperativeKernel()` is not necessary. `cuLaunchKernelEx()` is guaranteed to work if the attribute is set.

> * Option (a) if `LaunchConfig` only exposes `cooperative_launch` attribute, then in the usecase of multi-GPU or grid-sync, there needs to be additional legalization checks to make sure that `gridDims`...

FWIW the occupancy query APIs are tracked in #504, due to the request from the CUTLASS team, but it seems needed as part of this discussion too.

> Throw errors if launch hangs due to use of grid.sync without cuLaunchKernelEx(CU_LAUNCH_ATTRIBUTE_COOPERATIVE) This does not seem to hang; it seems to raise a sticky `CUDA_ERROR_LAUNCH_FAILED` error. @pciolkosz Does this...

Cooperative launch is implemented in #676. Moving this issue to P1 / parking lot. We still want to keep this open for covering more launch attributes.

A few pointers to consider when we design this: - https://github.com/pytorch/pytorch/pull/130386 - https://github.com/pytorch/pytorch/pull/137318 - https://github.com/cupy/cupy/pull/8615 - https://github.com/numba/numba/pull/4182

Discussed internally. With all things considered will take a multi-phase approach to iteratively enhance the CUDA graph coverage. Below is the phase-1 design considerations: - Only cover stream capture (no...

Design is being wrapped up with a prototype (#455). Moving this to beta 4.