Leo Fang

Results 1175 comments of Leo Fang

Looks like I dropped the ball... 😅 From my perspective as a low-level library developer trying to interface with Python (e.g. [nvmath-python](https://github.com/NVIDIA/nvmath-python/) is live), I don't care if DLPack or...

Please apply this patch: https://github.com/conda-forge/cupy-feedstock/blob/main/recipe/fix_cub_constexpr.diff which seems to be required for vs2019+ (xref: https://github.com/conda-forge/cupy-feedstock/pull/261#discussion_r1573027806). I will find time to file a PR tomorrow.

Q: Would this only cover libcudacxx, or CUB too?

There's no `windows-amd64-gpu-rtx2080-latest-1` runner so the CI hangs.

``` libcudacxx: Pass: 50%/2 ``` Is this a concern (that it's not 100%)?

There is one bug in cuda.bindings. The bug is that when `cuCheckpointProcessGetState` fails, `state` is randomly populated (or not touched by the driver at all, it doesn't matter for our...

> do you want me to add bandit / codeql to pre-commit before we merge this? I think it is fine to do it in a separate PR, so we...

We'll have to cache CC on a per-`Device` object level to bring this down to O(10) ns level. ```python In [32]: def get_cc(dev): ...: if dev in data: ...: return...

I did some refactoring of `Device.__new__()` to replace cudart APIs by driver APIs, and found the perf gets even worse. Out of curiosity, I did this quick profiling and got...