YueAn329
Results
1
comments of
YueAn329
我尝试使用accelerate.notebook_launcher进行分布式多卡训练,在进行图像分类训练时,如果够造一个IterableDataset的话,训练报错如下: ``` [E ProcessGroupGloo.cpp:138] Rank 1 successfully reached monitoredBarrier, but received errors while waiting for send/recv from rank 0. Please check rank 0 logs for faulty rank. Traceback (most recent...