Ricky Das

Results 13 comments of Ricky Das

@JackCaoG Do you have any insights on this issue?

PyTorch can do it because they are not doing distributed training using the trick XLA is doing it. For them it's native to torch itself, they are using DDP modules...

Excellent! That should solve the problem for now. I will try it out and update it here. But should work for sure. It is true that the trick of CUDA_VISIBLE_DEVICES...

I am also facing this same issue. Essentially I get this problem when I try to train it in a distributed manner. I tried every adjustment of the regularization parameters,...

Update on my issue: there was a problem with one of my GPUs in my multi node multi GPU setup. Some gate must have been broken.

Closing this issue since there has been no activity.

I think we can use xla::norm directly. It supports p=0 and p=inf. So it should address all your comments

Same here, I always comment out the Normalization while training