Jayllia
Jayllia
> > Do you fix this problem? > > Excuse me, have you fix it up? I met the same probelm. How to deal with it? Thanks
> 你好,@patk-motional**在本地缓存后**尝试使用 DDPPlugin**对完整数据集进行训练** 时,我遇到了失败。从缓存文件加载完整数据集的时间超过 30 分钟 ,导致源自“torch/distributed/distributed_c10d.py:460”的错误:**** ` INFO {/opt/conda/envs/nuplan/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:460} Waiting in store based barrier to initialize process group for rank: 0, key: store_based_barrier_key:1 (world_size=4, worker_count=1, timeout=0:30:00)` `Timed out initializing...
> When training with the full dataset, I always encounter one of the following two issues: > > 1. When training with a single worker, I get an error message...