Leo Fang
Leo Fang
We support - Python scalars - NumPy scalars - ctypes scalars These should all be tested in `test_launcher.py`.
We wanted to do this but it seems so far we've only covered `Stream.from_handle`. We want to also cover these objects: - `Program` - `ObjectCode` - `Kernel` - `Buffer`
- Support ctypes/numpy structs - make sure ctypes is deprioritized - Support converting arbitrary objects to `StridedMemoryView` - Benchmarking - measure `launch()` overhead - reimplement type dispatcher via dict lookup...
> This is a bit trickier than I thought, because we also need the dict key "old"/"new" as a proxy to prepare for `args` (which is different for `cuModuleLoadDataEx`/`cuLibraryLoadData`). Let's...
Currently it takes 3.4 - 3.45 us (depending on stream-ordering or not) to create a memory view object: ```python In [4]: x = cp.empty((23, 4)) In [7]: %timeit s =...
@gigony complained the following example is unclear compared to CuPy's RawKernel, and I agree. https://github.com/NVIDIA/cuda-python/blob/e1e332564c48db556212d59262a149b1a63285e8/docs_src/source/overview.md?plain=1#L208-L223 Specifically, how to pass typed pointers and scalars is really unclear. This also raises the...
Follow-up of #208. We should revisit the simple wrapper over `cudaMalloc` once we have better `MemoryResource`, and for that we might need [`pool_memory_resource` from RMM](https://github.com/rapidsai/rmm#pool_memory_resource) or perhaps CCCL if it's...