Leo Fang comments

Results 1175 comments of


                                            Leo Fang

Querying current device is slow compared to CuPy

Accessing `Device().compute_capability` is being addressed in #459. Let me re-label this issue to track the remaining binding performance issue.

Querying current device is slow compared to CuPy

@rwgk reported that `cuDriverGetVersion` is also sluggish when called repeatedly in a busy loop

Querying current device is slow compared to CuPy

Yes, see https://github.com/NVIDIA/cuda-python/issues/439#issuecomment-2673234572. Right now the problem is in cuda.bindings, not cuda.core. I had changed the issue label to reflect this status.

Querying current device is slow compared to CuPy

Wow! Great findings Vlad! It is insane how slow `IntEnum` (or any `Enum`-subclasses from the standard library) is... I wonder if it makes sense to build an internal cache ourselves?...

Querying current device is slow compared to CuPy

(Your fast path is also reasonable FWIW, just wonder if this is worth our efforts.)

Querying current device is slow compared to CuPy

My take from comparing version 1 and version 2 is that we wasted 100% overhead (60->120ns) just to create a tuple... We may want to think seriously about breaking the...

Querying current device is slow compared to CuPy

> My take from comparing version 1 and version 2 is that we wasted 100% overhead (60->120ns) just to create a tuple... I read it wrong. Creating the return tuple...

[BUG]: Latest CCCL breaks CuPy

> Build https://github.com/cupy/cupy/pull/8412 from source To unblock myself I've force-pushed back to the snapshot from yesterday, but starting from this commit https://github.com/cupy/cupy/commit/08e6a3c63fe734c259a83f76381ed973d99cd7bd it should be reproducible. By tracing the code...

[BUG]: Latest CCCL breaks CuPy

Hi Bernhard, thanks for your reply. > which relies on `thrust::less` having an actual `operator()`. We changed this in https://github.com/NVIDIA/cccl/pull/1872 > ... > (we cannot do this yet, because we...

[BUG]: Latest CCCL breaks CuPy

I had a somewhat lengthy discussion with Jake offline. Below is a summary of what I asked regarding specializing `thrust::less` vs `thrust::less::operator` noted above (and the offline thread), for posterity:...