Leo Fang issues

Results 278 issues of


                                            Leo Fang

Add tests to cover `cuda.core.experimental.launch()`

We support - Python scalars - NumPy scalars - ctypes scalars These should all be tested in `test_launcher.py`.

triage

test

cuda.core

Implement `from_handle()` for all `cuda.core` objects

We wanted to do this but it seems so far we've only covered `Stream.from_handle`. We want to also cover these objects: - `Program` - `ObjectCode` - `Kernel` - `Buffer`

triage

feature

cuda.core

`cuda.core.launch()` improvements

- Support ctypes/numpy structs - make sure ctypes is deprioritized - Support converting arbitrary objects to `StridedMemoryView` - Benchmarking - measure `launch()` overhead - reimplement type dispatcher via dict lookup...

enhancement

triage

cuda.core

Refactor for better handling of `cuModuleLoadDataEx`/`cuLibraryLoadData`

> This is a bit trickier than I thought, because we also need the dict key "old"/"new" as a proxy to prepare for `args` (which is different for `cuModuleLoadDataEx`/`cuLibraryLoadData`). Let's...

enhancement

cuda.core

Perf: Reduce `StridedMemoryView` construction time

Currently it takes 3.4 - 3.45 us (depending on stream-ordering or not) to create a memory view object: ```python In [4]: x = cp.empty((23, 4)) In [7]: %timeit s =...

enhancement

triage

cuda.core

Support NVVM IRs as input to `Program`

feature

cuda.core

Document best practices in preparing arguments for `cuLaunchKernel`

@gigony complained the following example is unclear compared to CuPy's RawKernel, and I agree. https://github.com/NVIDIA/cuda-python/blob/e1e332564c48db556212d59262a149b1a63285e8/docs_src/source/overview.md?plain=1#L208-L223 Specifically, how to pass typed pointers and scalars is really unclear. This also raises the...

documentation

cuda.bindings

Support cudart wheels

enhancement

packaging

cuda.bindings

Remove trampoline modules from the old layout

cuda.bindings

blocked

Evaluate feasibility of adding `pool_memory_resource`

Follow-up of #208. We should revisit the simple wrapper over `cudaMalloc` once we have better `MemoryResource`, and for that we might need [`pool_memory_resource` from RMM](https://github.com/rapidsai/rmm#pool_memory_resource) or perhaps CCCL if it's...

enhancement

cuda.core

blocked