HRMapNet icon indicating copy to clipboard operation
HRMapNet copied to clipboard

Unable to run tests

Open IgorZhiltsoff opened this issue 8 months ago • 2 comments

I tried running

./tools/dist_test_map.sh ./projects/configs/hrmapnet/hrmapnet_maptrv2_nusc_r50_110ep.py ./ckpts/hrmapnet_maptrv2_nuscenes_ep110.pth 1

(the checkpoint is downloaded from the link in repo's README)

and got an error

Traceback (most recent call last):
  File "./tools/test.py", line 264, in <module>
    main()
  File "./tools/test.py", line 229, in main
    model = MMDistributedDataParallel(
  File "/root/miniconda3/envs/smth/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 496, in __init__
    dist._verify_model_across_ranks(self.process_group, parameters)
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, unhandled system error, NCCL version 2.7.8
ncclSystemError: System call (socket, malloc, munmap, etc) failed.

Any ideas how to fix this?


My package versions are not exactly the ones you specified in the installation guide; namely, I

  1. Downgraded av2 to minimum,
  2. Downgared numpy to 1.23.0,
  3. Installed gcc-multilib,
  4. Upgraded gcc to 7 (https://anaconda.org/gouarin/gcc-7),
  5. Upgraded networkx to 3.1

IgorZhiltsoff avatar Apr 25 '25 10:04 IgorZhiltsoff

Same error. Any solutions?

ShenZheng2000 avatar Jun 28 '25 04:06 ShenZheng2000

Issue solved. Please check this comment

ShenZheng2000 avatar Jul 02 '25 20:07 ShenZheng2000