MapTR icon indicating copy to clipboard operation
MapTR copied to clipboard

多卡训练失败

Open youhha opened this issue 6 months ago • 0 comments

我使用4卡4090上训练maptr,nuscense数据集,出现以下报错,请问是什么原因呀 Traceback (most recent call last): File "./tools/train.py", line 260, in main() File "./tools/train.py", line 249, in main custom_train_model( File "/workspace/code/projects/mmdet3d_plugin/bevformer/apis/train.py", line 27, in custom_train_model
custom_train_detector( File "/workspace/code/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py", line 75, in custom_train_detector model = MMDistributedDataParallel( File "/usr/local/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 496, in init
dist._verify_model_across_ranks(self.process_group, parameters) RuntimeError: replicas[0][0] in this process with sizes [80, 128] appears not to match sizes of the same param in process 0.

youhha avatar Aug 21 '24 03:08 youhha