CenterNet-Lite icon indicating copy to clipboard operation
CenterNet-Lite copied to clipboard

多GPU训练

Open starsky68 opened this issue 3 years ago • 0 comments

在使用torch.nn.DataParallel多gpu训练的时候,train阶段没问题,val阶段报错。 raceback (most recent call last): File "train.py", line 347, in train() File "train.py", line 315, in train evaluator.evaluate(model) File "/home/11/CenterNet-Lite-master/utils/cocoapi_evaluator.py", line 196, in evaluate outputs = model(x) File "/home/11/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/11/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/11/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/11/.local/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/home/11/.local/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) TypeError: Caught TypeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/11/.local/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/home/11/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/11/CenterNet-Lite-master/models/centernet.py", line 215, in forward cls_loss, txty_loss, twth_loss, total_loss = tools.loss(pred_cls=cls_pred, File "/home/11/CenterNet-Lite-master/tools.py", line 160, in loss gt_cls = label[:, :, :num_classes].float() TypeError: 'NoneType' object is not subscriptable

starsky68 avatar May 31 '21 01:05 starsky68