res-loglikelihood-regression icon indicating copy to clipboard operation
res-loglikelihood-regression copied to clipboard

trian error

Open DUhaixia opened this issue 2 years ago • 1 comments

one gpu training,run trian.py meet this error

`` Traceback (most recent call last): File "D:/rhnet-daima/res-loglikelihood-regression-master/res-loglikelihood-regression-master/scripts/train.py", line 172, in main() File "D:/rhnet-daima/res-loglikelihood-regression-master/res-loglikelihood-regression-master/scripts/train.py", line 45, in main mp.spawn(main_worker, nprocs=ngpus_per_node, args=(opt, cfg)) File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 157, in start_processes while not context.join(): File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\multiprocessing\spawn.py", line 19, in _wrap fn(i, *args) File "D:\rhnet-daima\res-loglikelihood-regression-master\res-loglikelihood-regression-master\scripts\train.py", line 55, in main_worker init_dist(opt) File "D:\rhnet-daima\res-loglikelihood-regression-master\res-loglikelihood-regression-master\rlepose\utils\env.py", line 24, in init_dist world_size=opt.world_size, rank=opt.rank) File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group init_method, rank, world_size, timeout=timeout File "D:\anaconda3\envs\PyTorch_YOLOv4-master\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for tcp://

DUhaixia avatar May 12 '22 08:05 DUhaixia

Hi @DUhaixia, you should change WORLD_SIZE in the config file to 1 when you use one GPU for training.

Jeff-sjtu avatar May 13 '22 08:05 Jeff-sjtu