gluon-cv icon indicating copy to clipboard operation
gluon-cv copied to clipboard

fix problem when training yolov3 with no-valid-bbox img

Open Aktcob opened this issue 4 years ago • 13 comments

Now u can train yolov3 with negative image

Aktcob avatar Dec 02 '20 03:12 Aktcob

Job PR-1555-1 is done. Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1555/1/index.html Code coverage of this PR: pr.svg vs. Master: master.svg

mli avatar Dec 02 '20 04:12 mli

@Aktcob Do you have a real training result with mixed images that has no groundtruth? Does the change cause NaNs?

zhreshold avatar Dec 02 '20 22:12 zhreshold

@zhreshold hi, I did not find any NANs right now when training my custom datasets.

BTW, I have trained like this for a long time. I think many people are faced with this problem when training yolo, so I create this PR.

Aktcob avatar Dec 03 '20 01:12 Aktcob

Job PR-1555-2 is done. Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1555/2/index.html Code coverage of this PR: pr.svg vs. Master: master.svg

mli avatar Dec 03 '20 20:12 mli

I also have a solution to train faster rcnn with no-valid-bbox img. I found that, the data transformers may change the label(as box)'s value so you need get the -1 valued box back after the transforms.

chinakook avatar Dec 24 '20 15:12 chinakook

@chinakook yep. The data transformers may change the gt bbox[-1,-1,-1,-1,-1] to [xx, xx, xx, xx, -1]. However, the class is still -1. So, when do label assignment, should remove this fake bbox according to the class -1.

BTW, data transformers could not change the size of bbox. So, the value of xx is the same. For example, [-1,-1,-1,-1,-1] to [200,200,200,200,-1]. And this fake bbox will not be assigned to any anchor cause IOU is zero.

Refer to: https://github.com/dmlc/gluon-cv/pull/1555/files#diff-6d664a77b7ba0b38ea36b82ebc52b4a3ff73a08ff485d45582b6b87b2e9019d7R106

if np_gt_ids[b, m, 0] < 0: # ignore fake bbox break

Aktcob avatar Dec 25 '20 03:12 Aktcob

@Aktcob In the faster rcnn, the sampler would treat [-1,-1,-1,-1,-1] as ignore box, bug treat [200,200,200,200,-1] as negative box which will be trained ( with only softmax entropy, and without box regression) together with positive boxes as the last -1 denotes class is background(negative). I don't know how yolo do that but I think that we should ignore this box because we just need other negative boxes in this image. The [-1,-1,-1,-1,-1] sample should be used to make the model running normally, but cannot be trained.

chinakook avatar Dec 25 '20 07:12 chinakook

@chinakook But [200,200,200,200] is a special bbox, that is, a point with 0 width, 0 height. No anchor could match it.

Invalid box should have area of 0. https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/rcnn/faster_rcnn/rcnn_target.py#L50

Aktcob avatar Dec 25 '20 08:12 Aktcob

But I found no code to verify the area. The line below would assign the [200,200,200,200,-1] to negative, not ignore. https://github.com/dmlc/gluon-cv/blob/0f0b226e71519493e0744017f2af3b71ad77eb1c/gluoncv/model_zoo/rcnn/faster_rcnn/rcnn_target.py#L68

chinakook avatar Dec 25 '20 08:12 chinakook

@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.

Aktcob avatar Dec 25 '20 09:12 Aktcob

@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.

Did solve the nan problem?

chinakook avatar Dec 25 '20 10:12 chinakook

@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.

Did solve the nan problem?

When training yolov3, no NAN problem.

Aktcob avatar Dec 25 '20 10:12 Aktcob

@Aktcob Since voc detection dataset is modified, all object detector will be affected. I will hold this until other detector training is verified.

zhreshold avatar Jan 06 '21 21:01 zhreshold