Leo Fang comments

Results 1175 comments of


                                            Leo Fang

[RFC] Turn `cupy` into a real wheel from an sdist?

> I think it's close to work (or already working) for PyTorch Update: It's already working: - https://developer.nvidia.com/blog/streamline-cuda-accelerated-python-install-and-packaging-workflows-with-wheel-variants/ - https://pytorch.org/blog/pytorch-wheel-variants/ - https://labs.quansight.org/blog/python-wheels-from-tags-to-variants - https://astral.sh/blog/wheel-variants

[RFC] Turn `cupy` into a real wheel from an sdist?

Thanks, @rgommers! > I'd start with repackaging the existing set of wheels (essentially just adding metadata); that is how the PyTorch 2.8.0 wheels were produced. Do you have a pointer...

zlib required for cuDNN 8.3.0 on Windows

> cc/ @leofang This may affect conda-forge packaging (need to add [`zlib`](https://anaconda.org/conda-forge/zlib) as a dependency to cuDNN 8.3.0.) Thanks, @kmaehashi. WIP in https://github.com/conda-forge/cudnn-feedstock/pull/38.

Integration with dlpack protocol

Hi, wearing my Python array API (& DLPack) hat, I'd like to chime in here. As much as I'd love to have DLPack support auto-generated, there exists one gap as...

Extend Unified Memory Programming to fully cover managed memory

It might be possible that this is also testable on regular x86-64 CIs!

`cudax::stream::wait()` blocks the caller, while `cudax::stream::wait(event_ref)` does not

This is really a bad name for people who have some basic knowledge in CUDA: ```c++ A.wait(); ``` Can we just call it `A.synchronize();` or `A.sync();` (matching cuda.core)?

Evaluate feasibility of adding `pool_memory_resource`

Blocked by #104.

CI: Cover testing against linux-aarch64 + 2 GPUs

Hi @ajschmidt8 I noticed that the arm64 runner actually has 4 GPUs: https://github.com/NVIDIA/cuda-python/actions/runs/12308859180/job/34356159459#step:5:30 Does it mean we can use it for multi-GPU tests (for which we have none today; this...

CI: Cover testing against linux-aarch64 + 2 GPUs

Updated the title as the 1-GPU runner was added in #289. Spoke with AJ and the multi-GPU runner is currently blocked.

CI: Test conda-based workflows

@jakirkham suggested to use Miniforge GHA: https://github.com/NVIDIA/cuda-python/pull/289#discussion_r1883278334, good to know it exists! 🙏