cosypose
cosypose copied to clipboard
Multi-gpu on a single node
Hello, I can success run 'Single gpu on a single node', but when I try to use ‘Multi-gpu on a single node’, I get the following error:
Traceback (most recent call last): File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/site-packages/cosypose-1.0.0-py3.7-linux-x86_64.egg/cosypose/scripts/run_cosypose_eval.py", line 16, in
from cosypose.config import EXP_DIR, MEMORY, RESULTS_DIR, LOCAL_DATA_DIR File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/site-packages/cosypose-1.0.0-py3.7-linux-x86_64.egg/cosypose/config.py", line 33, in assert LOCAL_DATA_DIR.exists() AssertionError Setting OMP and MKL num threads to 1.
Why the LOCAL_DATA_DIR
is in the python3.7/site-packages/cosypose-1.0.0-py3.7-linux-x86_64.egg/cosypose/config.py not in the projects/cosypose/cosypose
And now it change to:
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled cuda error, NCCL version 2.7.8 Setting OMP and MKL num threads to 1.
@Arrebol2020 Hello,have you solved it?
@Arrebol2020 Hello,have you solved it?
I didn't sovle it, so I try to implement DDP by myself, it seems to work.
@Arrebol2020 might you share your work?