gluon-cv
gluon-cv copied to clipboard
fix problem when training yolov3 with no-valid-bbox img
Now u can train yolov3 with negative image
Job PR-1555-1 is done.
Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1555/1/index.html
Code coverage of this PR: vs. Master:
@Aktcob Do you have a real training result with mixed images that has no groundtruth? Does the change cause NaNs?
@zhreshold hi, I did not find any NANs right now when training my custom datasets.
BTW, I have trained like this for a long time. I think many people are faced with this problem when training yolo, so I create this PR.
Job PR-1555-2 is done.
Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1555/2/index.html
Code coverage of this PR: vs. Master:
I also have a solution to train faster rcnn with no-valid-bbox img. I found that, the data transformers may change the label(as box)'s value so you need get the -1 valued box back after the transforms.
@chinakook yep. The data transformers may change the gt bbox[-1,-1,-1,-1,-1] to [xx, xx, xx, xx, -1]. However, the class is still -1. So, when do label assignment, should remove this fake bbox according to the class -1.
BTW, data transformers could not change the size of bbox. So, the value of xx is the same. For example, [-1,-1,-1,-1,-1] to [200,200,200,200,-1]. And this fake bbox will not be assigned to any anchor cause IOU is zero.
Refer to: https://github.com/dmlc/gluon-cv/pull/1555/files#diff-6d664a77b7ba0b38ea36b82ebc52b4a3ff73a08ff485d45582b6b87b2e9019d7R106
if np_gt_ids[b, m, 0] < 0: # ignore fake bbox break
@Aktcob In the faster rcnn, the sampler would treat [-1,-1,-1,-1,-1] as ignore box, bug treat [200,200,200,200,-1] as negative box which will be trained ( with only softmax entropy, and without box regression) together with positive boxes as the last -1 denotes class is background(negative). I don't know how yolo do that but I think that we should ignore this box because we just need other negative boxes in this image. The [-1,-1,-1,-1,-1] sample should be used to make the model running normally, but cannot be trained.
@chinakook But [200,200,200,200] is a special bbox, that is, a point with 0 width, 0 height. No anchor could match it.
Invalid box should have area of 0. https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/rcnn/faster_rcnn/rcnn_target.py#L50
But I found no code to verify the area. The line below would assign the [200,200,200,200,-1] to negative, not ignore. https://github.com/dmlc/gluon-cv/blob/0f0b226e71519493e0744017f2af3b71ad77eb1c/gluoncv/model_zoo/rcnn/faster_rcnn/rcnn_target.py#L68
@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.
@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.
Did solve the nan problem?
@chinakook ye. u are right. should do some changes in the target generator of faster rcnn. But right now, training yolov3 is ok.
Did solve the nan problem?
When training yolov3, no NAN problem.
@Aktcob Since voc detection dataset is modified, all object detector will be affected. I will hold this until other detector training is verified.