Leo Fang

Results 278 issues of Leo Fang

We support - Python scalars - NumPy scalars - ctypes scalars These should all be tested in `test_launcher.py`.

triage
P1
test
cuda.core

We wanted to do this but it seems so far we've only covered `Stream.from_handle`. We want to also cover these objects: - `Program` - `ObjectCode` - `Kernel` - `Buffer`

triage
P1
feature
cuda.core

- Support ctypes/numpy structs - make sure ctypes is deprioritized - Support converting arbitrary objects to `StridedMemoryView` - Benchmarking - measure `launch()` overhead - reimplement type dispatcher via dict lookup...

enhancement
triage
P1
cuda.core

> This is a bit trickier than I thought, because we also need the dict key "old"/"new" as a proxy to prepare for `args` (which is different for `cuModuleLoadDataEx`/`cuLibraryLoadData`). Let's...

enhancement
P2
cuda.core

Currently it takes 3.4 - 3.45 us (depending on stream-ordering or not) to create a memory view object: ```python In [4]: x = cp.empty((23, 4)) In [7]: %timeit s =...

enhancement
triage
P1
cuda.core

@gigony complained the following example is unclear compared to CuPy's RawKernel, and I agree. https://github.com/NVIDIA/cuda-python/blob/e1e332564c48db556212d59262a149b1a63285e8/docs_src/source/overview.md?plain=1#L208-L223 Specifically, how to pass typed pointers and scalars is really unclear. This also raises the...

documentation
P1
cuda.bindings

enhancement
P2
packaging
cuda.bindings

Follow-up of #208. We should revisit the simple wrapper over `cudaMalloc` once we have better `MemoryResource`, and for that we might need [`pool_memory_resource` from RMM](https://github.com/rapidsai/rmm#pool_memory_resource) or perhaps CCCL if it's...

enhancement
P1
cuda.core
blocked