mmdetection
mmdetection copied to clipboard
PAA training error
Thanks for your error report and we appreciate it a lot.
Checklist
- I have searched related issues but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version.
Describe the bug PAA detector training breaks.
Reproduction
- What command or script did you run?
python tools/train.py configs/paa/paa_r50_fpn_1x_coco.py
- Did you make any modifications on the code or config? Did you understand what you have modified? No
- What dataset did you use? COCO
Environment
- Please run
python mmdet/utils/collect_env.pyto collect necessary environment information and paste it here. - You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as
$PATH,$LD_LIBRARY_PATH,$PYTHONPATH, etc.)
Error traceback If applicable, paste the error trackback here.
Traceback (most recent call last):
File "/home/user/mmdetection/tools/train.py", line 242, in <module>
main()
File "/home/user/mmdetection/tools/train.py", line 231, in main
train_detector(
File "/home/user/mmdetection/mmdet/apis/train.py", line 244, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/user/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/user/mmcv/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/user/mmcv/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/user/mmcv/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/user/mmdetection/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/home/user/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/mmcv/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/home/user/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/user/mmdetection/mmdet/models/detectors/single_stage.py", line 83, in forward_train
losses = self.bbox_head.forward_train(x, img_metas, gt_bboxes,
File "/home/user/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 335, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/user/mmcv/mmcv/runner/fp16_utils.py", line 198, in new_func
return old_func(*args, **kwargs)
File "/home/user/mmdetection/mmdet/models/dense_heads/paa_head.py", line 152, in loss
reassign_bbox_weights, num_pos = multi_apply(
File "/home/user/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/user/mmdetection/mmdet/models/dense_heads/paa_head.py", line 349, in paa_reassign
gmm.fit(pos_loss_gmm)
File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_base.py", line 193, in fit
self.fit_predict(X, y)
File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_base.py", line 246, in fit_predict
self._m_step(X, log_resp)
File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_gaussian_mixture.py", line 691, in _m_step
self.precisions_cholesky_ = _compute_precision_cholesky(
File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_gaussian_mixture.py", line 333, in _compute_precision_cholesky
raise ValueError(estimate_precision_error_message)
ValueError: Fitting the mixture model failed because some components have ill-defined empirical covariance (for instance caused by singleton or collapsed samples). Try to decrease the number of components, or increase reg_covar.
Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/paa_head.py#L340 修改这里,可以google一下,有很多这个问题
@FDInSky could you please specify it? Or post a link for the relevant issue?
Hi @zen-d , Could be more specific, for example, you train the original PAA config on COCO dataset or did you make any modification?
Hi @ZwwWayne, as my reply was initially posted in the Reproduction part:
- The dataset is official COCO'17 detection.
- I did not make any modifications to the code or cfg.
@RangiLyu @ZwwWayne any fix?
@RangiLyu @ZwwWayne any fix?
Could you try the method in https://github.com/open-mmlab/mmdetection/issues/4152?
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.