QueryDet-PyTorch icon indicating copy to clipboard operation
QueryDet-PyTorch copied to clipboard

Hello,when i run the train_coco.py,i came across a bug:'AttributeError: 'Trainer' object has no attribute '_detect_anomaly',could you help me?

Open bling-bling-only opened this issue 2 years ago • 5 comments

Traceback (most recent call last): File "train_coco.py", line 15, in launch( File "/home/lei/anaconda3/envs/py38/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch main_func(args) File "/home/lei/Project/QueryDet-PyTorch-main/train_tools/coco_train.py", line 177, in start_train return trainer.train() File "/home/lei/Project/QueryDet-PyTorch-main*/apex_tools/apex_trainer.py", line 227, in train self.run_step() File "/home/lei/Project/QueryDet-PyTorch-main*/apex_tools/apex_trainer.py", line 251, in run_step self._detect_anomaly(losses, loss_dict) AttributeError: 'Trainer' object has no attribute '_detect_anomaly'

bling-bling-only avatar Jul 04 '22 11:07 bling-bling-only

I meet the same problem...

choasup avatar Jul 18 '22 11:07 choasup

你用的apex的版本是什么,跟作者的一样吗

bling-bling-only avatar Jul 19 '22 00:07 bling-bling-only

I met the same problem. The problem is that the version number of detectron2 should be 0.2.1. Some versions of detectron2 delete the _ detect_anomaly from the Class SimpleTrainer, such as detectron2 in version 3.0.

Therefore, it is recommended that you install detectron2 version 0.2.1

Installation link: https://github.com/facebookresearch/detectron2/releases

XinZhangRadar avatar Jul 29 '22 06:07 XinZhangRadar

thank you for your reply.I once suspected that it was the version of detection2, but since the computer's graphics card is 30 series, only version 3.0 of detection2 is supported at least.Your answer confirmed my suspicions.

bling-bling-only avatar Jul 29 '22 06:07 bling-bling-only

thank you for your reply.I once suspected that it was the version of detection2, but since the computer's graphics card is 30 series, only version 3.0 of detection2 is supported at least.Your answer confirmed my suspicions.

You can still circumvent this issue by adding the following function inside the ApexTrainer class:

def _detect_anomaly(self, losses, loss_dict):
        if not torch.isfinite(losses).all():
            raise FloatingPointError(
                "Loss became infinite or NaN at iteration={}!\nloss_dict = {}".format(
                    self.iter, loss_dict
                )
            )

sotoy avatar Aug 31 '22 07:08 sotoy

Hi, this problem is caused by the APEX library. We have recently updated the whole repository, and you do not need APEX any more.

ChenhongyiYang avatar Feb 16 '23 18:02 ChenhongyiYang