Ziyan Chen
Ziyan Chen
Same here. Anyone got solutions?
The codes use `accelerate` to do DDP automatically.
Caught the same problem here. Multi-GPU training would stuck in step 1 while single-GPU training works fine. I did some debugging. The first step always works fine until the second...
I've done some debugging. I believed that some reasons caused this hanging, such as my linux kernel is too old that it can't support latest version of torch and accelerate,...