cuda-python
cuda-python copied to clipboard
CUDA Python: Performance meets Productivity
An `ObjectCode` instance can encapsulate either PTX, LTO-IR, or CUBIN. All of these can be serialized. This would help us as well as any downstream project implement a simple persistent...
We want to automate this so that we don't need to revisit all past PRs by release time. During a recent offline discussion, https://sphinx-github-changelog.readthedocs.io/en/latest/ was suggested, but upon a closer...
Today, `LaunchConfig` only supports `cuLaunchKernel` driver API to launch kernels on a single GPU. When extending to broader usecases where there is a need for inter-SM synchronization or multi-GPU synchronization,...
Getting the current device using `cuda.core` is quite a bit slower than CuPy: ```python In [1]: import cupy as cp In [2]: %timeit cp.cuda.Device() 69 ns ± 0.496 ns per...
For example, attestation seems like a fancy thing we can easily do https://scientific-python.org/specs/spec-0008/#example-workflow
Would it be possible to make numpy an optional dependency for `cuda.core`? For example, if you just want to use `cuda.core` to query system device properties, installing a BLAS implementation...
https://github.com/dmlc/dlpack/releases/tag/v1.1
Currently the program name when compiling `c++` code with `Program` always [defaults](https://github.com/NVIDIA/cuda-python/blob/main/cuda_core/cuda/core/experimental/_program.py#L399) to `'default_program'`. This makes errors a little less descriptive than what is currently supported by `numba-cuda`. For instance...