Ziyan Chen

Results 4 comments of Ziyan Chen

Same here. Anyone got solutions?

The codes use `accelerate` to do DDP automatically.

Caught the same problem here. Multi-GPU training would stuck in step 1 while single-GPU training works fine. I did some debugging. The first step always works fine until the second...

I've done some debugging. I believed that some reasons caused this hanging, such as my linux kernel is too old that it can't support latest version of torch and accelerate,...