tf2-faster-rcnn
tf2-faster-rcnn copied to clipboard
To much impact from the background
Hi at first I want to say that is the best Faster RCNN I could find which is created with Tensorflow2.
I try this one and change some parts. Special the classes and the database. But nothing more. In my case I take a look to the output and all the data is labled with the background index. You can imagine, the result are no labels. Do you have any idea why this should be happen? The interface to fill the FasterRCNN class is similar to yours. The label count in each of the images is around 10 to 20 labels, the size of the images are 1000:840. Edit: it looks like the problem of the RPN which is decribed in https://arxiv.org/pdf/1506.01497.pdf 3.1.3. Is there any algorithm implemented?
Hi at first I want to say that is the best Faster RCNN I could find which is created with Tensorflow2.
I try this one and change some parts. Special the classes and the database. But nothing more. In my case I take a look to the output and all the data is labled with the background index. You can imagine, the result are no labels. Do you have any idea why this should be happen? The interface to fill the FasterRCNN class is similar to yours. The label count in each of the images is around 10 to 20 labels, the size of the images are 1000:840. Edit: it looks like the problem of the RPN which is decribed in https://arxiv.org/pdf/1506.01497.pdf 3.1.3. Is there any algorithm implemented?
First, thank you for the compliment. Second, because most of the time in the image, the negative samples are dominate, so the original paper of FasterRCNN sample the positive and negative samples in 1:1. In this implementation, you may check the config file in ./config/config.py, where the training of RPN takes this approach to sample pos and neg samples in a ratio of 1:1. Specific variable: pos_sample_ratio = 0.5 Algotithm implementation: model/layer/anchor_to_target.py
So i took a closer look to your given part. I also check my input data again. I dont think it should be a problem but my used image size is ~3500 * 1900. Because of the resizing shouldnt be any problem with this size. Also I train the whole network instead of using a pretrained network. The problem still exists. The bg classification is to strong and no objects are detected. While the training the bbox roi loss is most of the time 0.
Please specify:
- how many images are there in your own dataset?
- how many epochs have you trained?
If the dataset is small or not enough epochs, chances are the network was undertrained.
My recommendations are:
- Do a sanity check by overfitting few images with more than enough epochs, then see if the problem is still there.
- Start with pretrained network and then fine tune to your dataset, if your dataset is relatively small.
As a reference: It took ~10 epochs and ~10k iterations in each epoch to train this network in VOC2007. (using ImageNet VGG16 pretrained).
Please specify:
- how many images are there in your own dataset? => 500
- how many epochs have you trained? => at first 10, yesterday 3 If the dataset is small or not enough epochs, chances are the network was undertrained. => I debugged into the training process and at the beginning the ground truth would be more recognized than after the training. My recommendations are:
- Do a sanity check by overfitting few images with more than enough epochs, then see if the problem is still there. => I will try
- Start with pretrained network and then fine tune to your dataset, if your dataset is relatively small. => I will check Edit: Data distribution with label converted into an index: 1: 1145 counts. 2: 152 counts. 3: 1221 counts. 4: 58 counts. 5: 49 counts. 6: 582 counts.
Thanks for the feedback :)
So I tried it: I took at first 12 epochs and Second I took 3 epochs with a pretrained VGG16. Both has the same test outpu: nothing. Next I will try to overfit it with synthetical data.
During sanity check, you can monitor the gradient flow(input->loss->grads->weights) to see if the model was actually got trained, which is a proper way to locate the bug in code.