cascade-rcnn
                                
                                 cascade-rcnn copied to clipboard
                                
                                    cascade-rcnn copied to clipboard
                            
                            
                            
                        how to train it on my own dataset
hi! I want to train cascade-rcnn on my own dataset (three classes). I don't know how to modify the files(eg. examples/voc/). Can you give me some instructions? Thank you!
Hi, When I train the models such as res50-12s-600-rfcn-cascade without FPN  with my own dataset is fine. But when I try to train res50-15s-800-fpn-cascade with my own dataset, I meet the problem that decode_bbox_layer cannot get valid bbox.  After the code of "screen out high IoU boxes, to remove redundant gt boxes" the valid_bbox_ids is 0.
So, what the problem might be? Thanks. @zhaoweicai
@makefile If you don't want to remove the redundant gt boxes, you can simply set gt_iou_thr=1.0 or higher. But a more important problem is you might not have enough proposals. In your case of error, only gt boxes and no negative box. You can try to lower the proposal threshold in "BoxGroupOutput" layer to have more proposals. Or your training is diverging and crashed. You can also try to use a lower learning rate.
@zhaoweicai Thanks! Follow your advice, set lower the fg_thr in BoxGroupOutput layer, the problem disappeared.
@zhaoweicai @makefile I try to train cascade rcnn on my own dataset, and I got this problem, I tried to lower the iou_thr in "BoxGroupOutput" layer but the problem still there, can you give me any suggestion.
The error seems related to multiple gpus. When I tried single gpu (not all GPU ids, gpu id 1 is fine, but gpu id 2 encounters same above error), training proceeds; however, with 2 gpus, encountered same above error.
@Peng-wei-Yu try lower the score of fg_thr instead of nms thresh.
@jwnsu @makefile Thank you for you help. But I tried to lower fg_thr and use only GPU 1, the problem is still there. Have you tried to change the --weights in train_detection, I decided to change the caffemodel and have a try.
FYI. coco model seems to work fine (e.g. coco/res50-15s-800-fpn-cascade is fine, res101 runs out of GPU memory on 1080 Ti), suggest you switch to coco flavor from voc.
@Peng-wei-Yu when you change the number of GPUs, you should change the learning rate at the same time, as described in the paper.
@jwnsu The code should have no problem on multi-gpu training or VOC dataset. Try the run the script a couple of times to see if the problem still happens. If the problem is still there, try to lower the learning rate a little bit. If it still cannot be fixed, maybe there is something wrong.
@makefile @zhaoweicai When you trained cascade rcnn on your own data, which caffemodel did you use. Your own caffemodel or ResNet-50-model-merge.caffemodel. The picture in my own data have the size of 1600*1200, should I change the short_size and long_size in train.prototxt.
@Peng-wei-Yu If you use the author's prototxt, you should use the corresponding ResNet-50-model-merge.caffemodel, since it merges the BN layer to scale layer to reduce memory and speed up. You can increase the input size of image if your memory is enough, but the result may not increase too much.
@makefile Thank you very much. I'll have a try by using ResNet-50-model-merge.caffemodel.
@makefile @Peng-wei-Yu in BoxGroupOutput layer,the original setting is 0.001, you finally set it?
@makefile @Peng-wei-Yu
When I was training, batchsize was equal to 1. There was at least one sample in my own training pictures, but Why is total positive equal to 0 in many iterations during the training process?and my rpn loss is 0.Have you encountered such a problem?

@GuoxingYan I set fg_thr: 0.01 or 0 in all BoxGroupOutput layer. If your positive rois num is always 0, maybe your dataset has some problem.
@makefile Did you try to change the short_size and long_size in train.prototxt?when i only changed the short_size or long_size ,There will be an error。
@GuoxingYan I did not try to change that, since there use Deconvolution layer to upsample, the size maybe need to be multiplier of 32, 64 or larger.
@makefile thank you very much!!
@makefile Will you have the following problems when training fpn?

@GuoxingYan I didn't met. the integer seems to be abnormal big.
@Peng-wei-Yu @zhaoweicai  my own data size is 960*1280,I try to use the ResNet-50-model-merge.caffemodel, but I also get this problem.

@makefile @zhaoweicai @Peng-wei-Yu When I was training, I found that the short_size in detection_data_param in trian.prototxt is 800, which is exactly equal to img_width and img_height in proposal_target_param. So the question arises. When I change the short_size to 320, does the img_width and img_height need to be changed to 320?
@GuoxingYan I think it needs to be.
@makefile I use to train my owe dataset,how can I get the output for every picture?
@licy5152 I wrote a python script CascadeRCNN-demo.py imitate the matlab code, you can modify it to use.
@makefile 你的demo.py 显示无效链接诶。
@GuoxingYan 你的网络问题吧
@makefile @zhaoweicai
When I was training my own dataset, the following issue happened. However, I have already check that there is no box has xmin = 1664 and xmax = 636 in the window_file.txt. And I also have not found bbox_util.cpp file under the workspace directory.  Could you guys help me to solve this issue? Thanks a lot.

@PacteraKun The situation you encountered is unusual, check carefully.