Leo Fang
Leo Fang
> I think it's close to work (or already working) for PyTorch Update: It's already working: - https://developer.nvidia.com/blog/streamline-cuda-accelerated-python-install-and-packaging-workflows-with-wheel-variants/ - https://pytorch.org/blog/pytorch-wheel-variants/ - https://labs.quansight.org/blog/python-wheels-from-tags-to-variants - https://astral.sh/blog/wheel-variants
Thanks, @rgommers! > I'd start with repackaging the existing set of wheels (essentially just adding metadata); that is how the PyTorch 2.8.0 wheels were produced. Do you have a pointer...
> cc/ @leofang This may affect conda-forge packaging (need to add [`zlib`](https://anaconda.org/conda-forge/zlib) as a dependency to cuDNN 8.3.0.) Thanks, @kmaehashi. WIP in https://github.com/conda-forge/cudnn-feedstock/pull/38.
Hi, wearing my Python array API (& DLPack) hat, I'd like to chime in here. As much as I'd love to have DLPack support auto-generated, there exists one gap as...
It might be possible that this is also testable on regular x86-64 CIs!
This is really a bad name for people who have some basic knowledge in CUDA: ```c++ A.wait(); ``` Can we just call it `A.synchronize();` or `A.sync();` (matching cuda.core)?
Blocked by #104.
Hi @ajschmidt8 I noticed that the arm64 runner actually has 4 GPUs: https://github.com/NVIDIA/cuda-python/actions/runs/12308859180/job/34356159459#step:5:30 Does it mean we can use it for multi-GPU tests (for which we have none today; this...
Updated the title as the 1-GPU runner was added in #289. Spoke with AJ and the multi-GPU runner is currently blocked.
@jakirkham suggested to use Miniforge GHA: https://github.com/NVIDIA/cuda-python/pull/289#discussion_r1883278334, good to know it exists! 🙏