Nicolas Castet

Results 21 comments of Nicolas Castet

@yizhang2077 Is it reproducible on your side? I used one of my container images based on commit 9f635ea50de920aa507f486daafba26a5b837574 on a 8xH200 box and could not reproduce the failure with or...

@yizhang2077 Thanks let me try your container image. Might be related: > CUDA RNG operations are permitted, and when using multiple torch.Generator instances within a graph, they must be registered...

The failure did not happen for me on pytorch 2.7 but it does on 2.5. While debugging torch.compile: ``` /usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:725: UserWarning: Graph break due to unsupported builtin None._SimpleCData.__new__. This function...

> UserWarning: Graph break due to unsupported builtin None._SimpleCData.__new__. ... > @nvcastet It is wierd, since pynccl allreduce is also in critical path and is graphable. The message (displayed using...

@ispobock when downloading `nvidia-nccl-cu11`, I see `cu116`: ``` # pip download nvidia-nccl-cu11 Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Collecting nvidia-nccl-cu11 Downloading https://developer.download.nvidia.com/compute/redist/nvidia-nccl-cu11/nvidia-nccl-cu11-2022.5.19.tar.gz (16 kB) Preparing metadata (setup.py) ... done Collecting nvidia-nccl-cu116...

@yizhang2077 @zhyncs I went back to the first commit and register the pynccl algather as a pytorch custom op as you suggested. Ideally, it would be nice to get rid...

@ispobock Thanks I fixed the lint.

@EricHallahan Thanks a lot for raising this issue and the thorough discussion! We detect current OMPI with ORTE via the presence of the env var `OMPI_MCA_orte_hnp_uri`: https://github.com/google/jax/blob/main/jax/_src/clusters/ompi_cluster.py#L28-L29 For OpenMPI with...

You can use auto-detection via mpi4py for that. See https://github.com/google/jax/pull/20174

@Fridge003 I think you are correct it should be compatible since trt_allreduce_fusion has its own workspace allocation (unlike the custom-allreduce kernel that registers existing tensors). It means @gracehonv bug on...