cuda-python
cuda-python copied to clipboard
CUDA Python: Performance meets Productivity
@gigony complained the following example is unclear compared to CuPy's RawKernel, and I agree. https://github.com/NVIDIA/cuda-python/blob/e1e332564c48db556212d59262a149b1a63285e8/docs_src/source/overview.md?plain=1#L208-L223 Specifically, how to pass typed pointers and scalars is really unclear. This also raises the...
on cuda < 12, we return a cuFunction from get_kernel(), create an xfail test which launches the kernel on two distinct contexts
Today when building `cuda.bindings` there are two types of warnings and we should conclude on how they are to be handled. 1. dereferencing type-punned pointer will break strict-aliasing rules ```...
Follow-up of #208. We should revisit the simple wrapper over `cudaMalloc` once we have better `MemoryResource`, and for that we might need [`pool_memory_resource` from RMM](https://github.com/rapidsai/rmm#pool_memory_resource) or perhaps CCCL if it's...
Steps: 1. Add another workflow triggered every night 2. Reuse the latest artifacts generated from the main branch (as seen at the time of tests) 3. Only run tests and...
> I actually want these jobs if possible, yes, but I can't do so with the existing `[all]` notation for installing the latest CUDA wheels (we let them float and...