mmtracking icon indicating copy to clipboard operation
mmtracking copied to clipboard

Error when using dist_train/dist_test

Open gsygsy96 opened this issue 4 years ago • 4 comments

Hello! Sorry for disturbing again, but I have new problems, and it confuse me a lot. It appears that dist_test/dist_train cannot work. When I run the dist_test.sh using the command: bash ./tools/dist_test.sh configs/mot/tracktor/tracktor_faster-rcnn_r50_fpn_4e_mot17-public-half.py 2 --eval track I got the error below: TypeError: can't pickle _thread.RLock objects return Popen(process_obj) File "/usr/local/miniconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/usr/local/miniconda3/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/usr/local/miniconda3/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/usr/local/miniconda3/lib/python3.6/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects Traceback (most recent call last): File "/usr/local/miniconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/local/miniconda3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/miniconda3/lib/python3.6/site-packages/torch/distributed/launch.py", line 253, in main() File "/usr/local/miniconda3/lib/python3.6/site-packages/torch/distributed/launch.py", line 249, in main cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', './tools/test.py', '--local_rank=1', 'configs/mot/tracktor/tracktor_faster-rcnn_r50_fpn_4e_mot17-public-half.py', '--laun cher', 'pytorch', '--eval', 'track']' returned non-zero exit status 1.

But when I test the model using one-gpu command: python ./tools/test.py configs/mot/tracktor/tracktor_faster-rcnn_r50_fpn_4e_mot17-public-half.py --eval track It successfully works. Could you plz help me solve the problem? Thanks a lot!

gsygsy96 avatar Jan 27 '21 06:01 gsygsy96

BTW, I use pytorch1.3, cuda 10.0, mmcv 1.2.6, mmdet 2.8.0, python 3.6

gsygsy96 avatar Jan 27 '21 06:01 gsygsy96

How many GPUs do you have on your machine?

Can you try the other python versions? Like 3.7 or 3.8?

OceanPang avatar Jan 27 '21 08:01 OceanPang

I have 2 gpus on my machine.

gsygsy96 avatar Jan 27 '21 08:01 gsygsy96

@gsygsygsy123 Have no idea if you have fixed this, but we could hardly do error tracing with the limited information your provide. I would recommend you to try with Pytorch 1.5+ and post your environment information and the full stack of error tracing if the error still happens.

noahcao avatar May 06 '22 06:05 noahcao