HRMapNet
HRMapNet copied to clipboard
Unable to run tests
I tried running
./tools/dist_test_map.sh ./projects/configs/hrmapnet/hrmapnet_maptrv2_nusc_r50_110ep.py ./ckpts/hrmapnet_maptrv2_nuscenes_ep110.pth 1
(the checkpoint is downloaded from the link in repo's README)
and got an error
Traceback (most recent call last):
File "./tools/test.py", line 264, in <module>
main()
File "./tools/test.py", line 229, in main
model = MMDistributedDataParallel(
File "/root/miniconda3/envs/smth/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 496, in __init__
dist._verify_model_across_ranks(self.process_group, parameters)
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, unhandled system error, NCCL version 2.7.8
ncclSystemError: System call (socket, malloc, munmap, etc) failed.
Any ideas how to fix this?
My package versions are not exactly the ones you specified in the installation guide; namely, I
- Downgraded
av2to minimum, - Downgared
numpyto 1.23.0, - Installed
gcc-multilib, - Upgraded
gccto 7 (https://anaconda.org/gouarin/gcc-7), - Upgraded
networkxto 3.1
Same error. Any solutions?
Issue solved. Please check this comment