Sudhakar Singh
Sudhakar Singh
Was this issue resolved? @iwldzt3011
closing since no activity/no add. info provided.
Was this issue resolved? @stefanozampini
Closing since no activity, feel free to open again with more info! --- Edit: For such issues, it's a good idea to check the correctness of GPUs communication with [nccl-tests](https://github.com/NVIDIA/nccl-tests)
@noanabeshima was this resolved?
@milmor is this resolved now?
Was this issue resolved? @Dinple
I reran the colab repro. On Colab GPU, I get values which are fairly consistently large and similar (with some occasional `nan`, `inf`) ``` [0.0000000e+00 9.8054398e+10 1.9872968e+11 6.2595138e+10 2.0462353e+11 2.4414941e+10...
@yxd886 Multi-host for GPUs was added recently. Here's the documentation: https://jax.readthedocs.io/en/latest/multi_process.html#initializing-the-cluster. Feel free to start a new thread if you're still facing issues.
@chrisgrimm were you able to resolve this?