mmdetection PAA training error

trafficstars

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug PAA detector training breaks.

Reproduction

What command or script did you run?

python tools/train.py configs/paa/paa_r50_fpn_1x_coco.py

Did you make any modifications on the code or config? Did you understand what you have modified? No
What dataset did you use? COCO

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "/home/user/mmdetection/tools/train.py", line 242, in <module>
    main()
  File "/home/user/mmdetection/tools/train.py", line 231, in main
    train_detector(
  File "/home/user/mmdetection/mmdet/apis/train.py", line 244, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/user/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/user/mmcv/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/home/user/mmcv/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/home/user/mmcv/mmcv/parallel/data_parallel.py", line 75, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/user/mmdetection/mmdet/models/detectors/base.py", line 248, in train_step
    losses = self(**data)
  File "/home/user/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/mmcv/mmcv/runner/fp16_utils.py", line 110, in new_func
    return old_func(*args, **kwargs)
  File "/home/user/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/user/mmdetection/mmdet/models/detectors/single_stage.py", line 83, in forward_train
    losses = self.bbox_head.forward_train(x, img_metas, gt_bboxes,
  File "/home/user/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 335, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/home/user/mmcv/mmcv/runner/fp16_utils.py", line 198, in new_func
    return old_func(*args, **kwargs)
  File "/home/user/mmdetection/mmdet/models/dense_heads/paa_head.py", line 152, in loss
    reassign_bbox_weights, num_pos = multi_apply(
  File "/home/user/mmdetection/mmdet/core/utils/misc.py", line 30, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/home/user/mmdetection/mmdet/models/dense_heads/paa_head.py", line 349, in paa_reassign
    gmm.fit(pos_loss_gmm)
  File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_base.py", line 193, in fit
    self.fit_predict(X, y)
  File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_base.py", line 246, in fit_predict
    self._m_step(X, log_resp)
  File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_gaussian_mixture.py", line 691, in _m_step
    self.precisions_cholesky_ = _compute_precision_cholesky(
  File "/home/user/anaconda3/lib/python3.9/site-packages/sklearn/mixture/_gaussian_mixture.py", line 333, in _compute_precision_cholesky
    raise ValueError(estimate_precision_error_message)
ValueError: Fitting the mixture model failed because some components have ill-defined empirical covariance (for instance caused by singleton or collapsed samples). Try to decrease the number of components, or increase reg_covar.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Sep 01 '22 08:09 toyot-li

https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/dense_heads/paa_head.py#L340 修改这里，可以google一下，有很多这个问题

Sep 01 '22 10:09 FDInSky

@FDInSky could you please specify it? Or post a link for the relevant issue?

Sep 01 '22 11:09 toyot-li

Hi @zen-d , Could be more specific, for example, you train the original PAA config on COCO dataset or did you make any modification?

Sep 04 '22 04:09 ZwwWayne

Hi @ZwwWayne, as my reply was initially posted in the Reproduction part:

The dataset is official COCO'17 detection.
I did not make any modifications to the code or cfg.

Sep 04 '22 04:09 toyot-li

@RangiLyu @ZwwWayne any fix?

Sep 15 '22 02:09 toyot-li

@RangiLyu @ZwwWayne any fix?

Could you try the method in https://github.com/open-mmlab/mmdetection/issues/4152?

Sep 15 '22 07:09 RangiLyu

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Sep 22 '22 11:09 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

Sep 27 '22 11:09 github-actions[bot]

mmdetection mmdetection copied to clipboard

PAA training error

mmdetection
mmdetection copied to clipboard