Sudhakar Singh

Results 70 comments of Sudhakar Singh

closing since no activity/no add. info provided.

Was this issue resolved? @stefanozampini

Closing since no activity, feel free to open again with more info! --- Edit: For such issues, it's a good idea to check the correctness of GPUs communication with [nccl-tests](https://github.com/NVIDIA/nccl-tests)

@noanabeshima was this resolved?

@milmor is this resolved now?

I reran the colab repro. On Colab GPU, I get values which are fairly consistently large and similar (with some occasional `nan`, `inf`) ``` [0.0000000e+00 9.8054398e+10 1.9872968e+11 6.2595138e+10 2.0462353e+11 2.4414941e+10...

@yxd886 Multi-host for GPUs was added recently. Here's the documentation: https://jax.readthedocs.io/en/latest/multi_process.html#initializing-the-cluster. Feel free to start a new thread if you're still facing issues.

@chrisgrimm were you able to resolve this?