Vyas Ramasubramani

www.vyasr.com [email protected]

@rapidsai San Francisco, CA @rapidsai software developer at @NVIDIA. PhD in Chemical Engineering and Scientific Computing from @glotzerlab at the University of Michigan

Results 899 comments of


                                            Vyas Ramasubramani

[FEA] Expose CUDA 13 async pools for managed and pinned memory

OK, after some further research it looks like this cannot be done until CUDA 12.6. Pinned memory pools and asynchronous allocation was not supported until then. The pinned memory type...

[FEA] Expose CUDA 13 async pools for managed and pinned memory

Whoops my previous message was supposed to include links but I didn't copy them in. Yes, I'm basically just proposing that we do a [`cudaMemPoolCreate`](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY__POOLS.html#group__CUDART__MEMORY__POOLS_1g8158cc4b2c0d2c2c771f9d1af3cf386e) with a memory type of...

Building using versions of cuda-python with the new layout breaks runtime compatibility with older versions

At a high level my suspicion is that we are explicitly doing things that Cython is not designed to support by creating trampolines that do not reflect the original layout,...

Building using versions of cuda-python with the new layout breaks runtime compatibility with older versions

I think you should be able to reproduce this issue by building a project that uses cuda-python's Cython via the new API (so `from cuda.bindings cimport ...`) and then at...

[DOC] Explain managed memory behavior and prefetching design

I'm a little hesitant to put this level of information into rmm, especially the application design piece of it. That feels better suited to CUDA or CCCL docs since using...

[DOC] Recommend async memory resource more strongly as a default

I agree with recommending the async resource as a default. The main exception is deciding how to talk about managed memory for larger-than-memory workloads > But first, a quick question:...

[DOC] Recommend async memory resource more strongly as a default

> why do we need the RMM abstraction if the CUDA driver already handles memory efficiently The rmm abstraction predates the CUDA driver functionality (and certainly being broadly available) 🙂...

RFC: Enable per-thread default stream in free-threading builds

FWIW [here's my proposal for cudf](https://github.com/rapidsai/cudf/issues/17626).

RFC: Enable per-thread default stream in free-threading builds

@leofang could you please give me an example of what you're referring to?

RFC: Enable per-thread default stream in free-threading builds

Couldn't cuda-python unconditionally load both sets of symbols and then dispatch appropriately at runtime based on which one we wanted?

‹
1
2
...
81
82
83
84
85
86
87
88
89
90
›