OccNet
OccNet copied to clipboard
error when testing
I train the baseline with 1 A100-40G,using ./tools/dist_train.sh ./projects/configs/bevformer/bevformer_base_occ.py 1. After 24epoch,I tried to use ./tools/dist_test.py ./projects/configs/bevformer/bevformer_base_occ.py work_dirs/bevformer_base_occ/epoch_24.pth 1. After loading checkpoint and evaluate for 6019tasks, I saw the memory increased from18G to 42G, and suddenly it got error: torch.distributed.elastic.multiprocessing.api:failed. So how can I fix this.