tf-faster-rcnn
tf-faster-rcnn copied to clipboard
How to train Faster R-CNN on my own dataset?
Hi,every: I want to train Faster R-CNN on my own dataset,and this dataset has only two classes. I do not know how to change the code.Could you give me a favor for the change details? Think you very much!
You can refer to #17
@philokey You mean "add"? Like this: self._classes = ('background', # always index 0 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor','0','1')
or "replace"? Like this: self._classes = ('background', '0','1')
I have done like "replace",and got wrong with "invalid input arguments". Is there wrong?
@tp227 You should use "replace". Can you paste the error details? By the way, do you check the format of input data?
@philokey you are so niece! The e-mail failed to sent , so I paste the content as flowing:
When I run "./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16", I got the flowing :
“InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4096,3] rhs shape= [4096,21] [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@vgg_16/cls_score/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](vgg_16/cls_score/weights, save/RestoreV2_3/_7)]]”
My datasets have annotated with pascal_voc format. The output with VOC2007 file have replaced the original VOC2007 file. The name of each picture is like 10001000.jpg, meaning that the targets of this picture are "1" ,"0" ,"0" ,"0" ,"1" ,"0", "0" &"0".
In addition, I use labelImg to label the targets. The output XML('<' before each line has been deleted because of the wrong format) like this:
annotation verified="no"> folder>JPEGImages filename>00010001.jpg # path>/home/robot/tf-faster-rcnn-master/VOCdevkit/VOC2007/JPEGImages/00010001.jpg source> database>Unknown /source> size> width>1200 height>900 depth>3 /size> segmented>0 object> name>no pose>Unspecified truncated>0 Difficult>0</Difficult> bndbox> xmin>306 ymin>72 xmax>379 ymax>801 /bndbox>
Where is wrong?
And maybe the config.py and vgg16.py should be update? How?
@tp227 Have you solve this problem? Can you give some guide for what should be change?
@xuewenyuan sorry,not yet.
@endernewton Could you please give us an instruction about training on custom dataset? We met some errors, but still cannot figure them out. Thanks!
I'm wondering if the pascal_voc (class parts) is only one file which needs to be modified for trainning own dataset.
@paulcx I have change pascal_voc.py. It did not work.
From my experience, there only one place in pascal_voc.py needs to be modified for just training. you would need to change similar place and create a different network architecture for demo or test.
In pascal_voc.py, I changed the original classes into my classes. Is there anything which should be modified?
self._classes = ('__background__', # always index 0
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
self._classes = ('__background__', # always index 0
'figure', 'formula', 'table')
@xuewenyuan i may do that in some time, but maybe in the mean time this is a great practice for you to set it up on new datasets.
@xuewenyuan u also need to check the annotation of your dataset is 0-based or 1-based. I suggest u can compare voc with ilsvrc and will know which part is need to modify.
@endernewton @XiongweiWu Thanks. This work is a little difficult for me now. But I will try.
@philokey @endernewton @XiongweiWu I modified my dataset format, and the above error has disappeared. But I got the result as flowing:
4952 validation roidb entries Traceback (most recent call last): File "./tools/trainval_net.py", line 164, in
max_iters=args.max_iters) File "/home/robot/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 371, in train_net roidb = filter_roidb(roidb) File "/home/robot/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 360, in filter_roidb filtered_roidb = [entry for entry in roidb if is_valid(entry)] File "/home/robot/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 349, in is_valid overlaps = entry['max_overlaps'] KeyError: 'max_overlaps' Command exited with non-zero status 1 2.65user 0.08system 0:02.75elapsed 99%CPU (0avgtext+0avgdata 291384maxresident)k 0inputs+56outputs (0major+65477minor)pagefaults 0swaps
I don't know what's wrong about this. Maybe I didn't understand the 'max_overlaps' clearly. Could you please give me the exact explanation?
@tp227 I've recently loaded and ran my own dataset.
In my case I changed the _load_pascal_annotation in pascal_voc.py model file in order to read a simple text format instead of the xml format because it is quite verbose.
The function will return boxes, gt_classes, gt_overlaps, seg_areas.
gt_overlaps, seg_areas are calculated so the only thing you need to worry about is the boxes coordinates and classes.
The error you get above is in the data validation function which probably indicates that your data for training isn't correct. Maybe you have boxes that isn't inside of the image or they might be inverted to they don't have any area for instance if x1 > x2 a.s.o
@tp227 I have the same error. How do you solve the problem? Thanks.
@tp227 @w39865008 I have successfully run this framework.
I download the repository to make all files are updated. tf-faster-rcnn/lib/datasets/pascal_voc.py
is the only file I modified. I replaced self._classes
with the classes of my data. There is no necessary to change the number of class in networks.
The most important thing is to confirm your data have a correct format with pascal voc or coco. All my errors before are derived from this even if the difference of carriage return.
If you are confident with a correct environment configuration, just check whether your data have a right format.
And thanks @endernewton again for this excellent work.
@xuewenyuan I have solve the error to clear the file “default” . When you modified files, file “default” and "cache" need to be cleared. Thanks to your answer.
I am getting an error ` Fix VGG16 layers.. Fixed. iter: 20 / 70000, total loss: 0.839219
rpn_loss_cls: 0.482955 rpn_loss_box: 0.259976 loss_cls: 0.071895 loss_box: 0.024393 lr: 0.001000 speed: 0.752s / iter iter: 40 / 70000, total loss: 0.875337 rpn_loss_cls: 0.423560 rpn_loss_box: 0.350305 loss_cls: 0.086583 loss_box: 0.014890 lr: 0.001000 speed: 0.618s / iter iter: 60 / 70000, total loss: 1.133546 rpn_loss_cls: 0.348662 rpn_loss_box: 0.595944 loss_cls: 0.135538 loss_box: 0.053402 lr: 0.001000 speed: 0.508s / iter iter: 80 / 70000, total loss: 0.629528 rpn_loss_cls: 0.403030 rpn_loss_box: 0.180778 loss_cls: 0.034033 loss_box: 0.011688 lr: 0.001000 speed: 0.455s / iter iter: 100 / 70000, total loss: 0.750905 rpn_loss_cls: 0.374346 rpn_loss_box: 0.299970 loss_cls: 0.076589 loss_box: 0.000000 lr: 0.001000 speed: 0.435s / iter /home/gabbar/ML/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:28: RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) iter: 120 / 70000, total loss: nan rpn_loss_cls: 0.681892 rpn_loss_box: nan loss_cls: 1.877977 loss_box: 0.000000 lr: 0.001000 speed: 0.416s / iter iter: 140 / 70000, total loss: nan rpn_loss_cls: 0.672521 rpn_loss_box: nan loss_cls: 1.581246 loss_box: 0.000000 lr: 0.001000 speed: 0.402s / iter`
I am getting nan on rpn loss box after the RunTime Warning.
@kalaspuffar I see your approach in the #95 , Would you like to introduce your approach in detail? 1 How to modify the _load_pascal_annotation function in the pascal_voc.py ? 2 How to update the factory.py file to include your new dataset? 3 How to update the ./experiments/scripts/train_faster_rcnn.sh and ./experiments/scripts/test_faster_rcnn.sh to include your dataset? If you provide your code or a detailed guide, it's better. Thank you for your reply!
trainset_path = '/bigdisk/.../paris_trainset.pkl'#added
testset_path = '/bigdisk/.../paris_testset.pkl'#added
dataset_path = '/bigdisk/.../paris/'#added
if not os.path.exists( trainset_path ) or not os.path.exists( testset_path ):
trainset_dir = os.path.join( dataset_path, 'train2014' )
testset_dir = os.path.join( dataset_path, 'val2014' )
trainset = pd.DataFrame({'image_path': map(lambda x: os.path.join( trainset_dir, x ), os.listdir(trainset_dir))})
testset = pd.DataFrame({'image_path': map(lambda x: os.path.join( testset_dir, x ), os.listdir(testset_dir))})
trainset.to_pickle( trainset_path )
testset.to_pickle( testset_path )
else: trainset = pd.read_pickle( trainset_path ) testset = pd.read_pickle( testset_path )
this code can help you build your own .pkl dataset from imgaes and read it if it has been built.
@kalaspuffarhttps://github.com/kalaspuffar I see your approach in the #95https://github.com/endernewton/tf-faster-rcnn/issues/95 , Would you like to introduce your approach in detail? 1 How to modify the _load_pascal_annotation function in the pascal_voc.py ? 2 How to update the factory.py file to include your new dataset? 3 How to update the ./experiments/scripts/train_faster_rcnn.sh and ./experiments/scripts/test_faster_rcnn.sh to include your dataset? If you provide your code or a detailed guide, it's better. Thank you for your reply!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/endernewton/tf-faster-rcnn/issues/85#issuecomment-315949425, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXuSQR3l8SW0xvXmY12v1-nNLSTZlAfyks5sPClegaJpZM4NWt_T.
@kalaspuffar Hi, can you give me some advises on how did you run on your own dataset? Do I need to download the pre-trained model and weights? I tried the steps the author offer but I got 0 AP. Can you give me some advices?
@xuewenyuan Hi, I changed my dataset as the same format with PASCAL VOC, but I got 0 AP? I want to use the code for logo detection, do I still need to download the pre-trained model? Can you please give me some advices?
@zdm123 hi,I also have this problem,are you solve it,if you did,please help me,thank you so much!
In demo.py, line 144, modify the '21' to len(CLASSES).
Run demo.py again. it's worked. :D
It is perhaps due to the errors of "bbox" coodinates ( x < 0 or x > img_width ) in your Annotations. (At least for my case)
When I run pascal_voc.py with only self.classes changed to the classes of my custom dataset, I get the following error:
Traceback (most recent call last):
File "./tools/trainval_net.py", line 105, in
Can some one please help me regarding this issue?
1.I have successfully run that demo,but the terminal shows the following:
CUDA driver version is insufficient for CUDA runtime version
What should I do?
2. When I run:
./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16
There is a error to me like this :
Traceback (most recent call last):
File "./tools/trainval_net.py", line 97, in
Could you tell me the details of solving this question? thanks very much!
please check the variable self.classes if it contains any mistake.
In pascal_voc.py, I changed the original classes into my classes. Is there anything which should be modified?
self._classes = ('__background__', # always index 0 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor')
self._classes = ('__background__', # always index 0 'figure', 'formula', 'table')
Have you figure out this problem? I encountered the same problem.But I train my own data set before,modify some files(this file included) it worked,now I train my another data set again,it does not work.
@Emma-uestc try to clear the file “default” and "cache".
@Emma-uestc try to clear the file “default” and "cache".
I did clear the cache and output directory each time.Do you have any idear for use the GPU which does not been included in the GPU list above in this project,How to set the gpu-arch
I am getting an error ` Fix VGG16 layers.. Fixed. iter: 20 / 70000, total loss: 0.839219
rpn_loss_cls: 0.482955 rpn_loss_box: 0.259976 loss_cls: 0.071895 loss_box: 0.024393 lr: 0.001000 speed: 0.752s / iter iter: 40 / 70000, total loss: 0.875337 rpn_loss_cls: 0.423560 rpn_loss_box: 0.350305 loss_cls: 0.086583 loss_box: 0.014890 lr: 0.001000 speed: 0.618s / iter iter: 60 / 70000, total loss: 1.133546 rpn_loss_cls: 0.348662 rpn_loss_box: 0.595944 loss_cls: 0.135538 loss_box: 0.053402 lr: 0.001000 speed: 0.508s / iter iter: 80 / 70000, total loss: 0.629528 rpn_loss_cls: 0.403030 rpn_loss_box: 0.180778 loss_cls: 0.034033 loss_box: 0.011688 lr: 0.001000 speed: 0.455s / iter iter: 100 / 70000, total loss: 0.750905 rpn_loss_cls: 0.374346 rpn_loss_box: 0.299970 loss_cls: 0.076589 loss_box: 0.000000 lr: 0.001000 speed: 0.435s / iter /home/gabbar/ML/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:28: RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) iter: 120 / 70000, total loss: nan rpn_loss_cls: 0.681892 rpn_loss_box: nan loss_cls: 1.877977 loss_box: 0.000000 lr: 0.001000 speed: 0.416s / iter iter: 140 / 70000, total loss: nan rpn_loss_cls: 0.672521 rpn_loss_box: nan loss_cls: 1.581246 loss_box: 0.000000 lr: 0.001000 speed: 0.402s / iter`
I am getting nan on rpn loss box after the RunTime Warning.
@abhiML hi, I got the same problem. How did you fix this problem? I'm struggling with tons of Nan...
I am following the steps from the repository webpage, and ran the default test script, but I get zeros everywhere.
GPU_ID=0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101
... ... Reading annotation for 4950/4952 Reading annotation for 4951/4952 Reading annotation for 4952/4952 Saving cached annotations to /home/sagarwal/tf-faster-rcnn/data/VOCdevkit2007/annotations_cache/test_annots.pkl AP for aeroplane = 0.0000 AP for bicycle = 0.0000 AP for bird = 0.0000 AP for boat = 0.0000 AP for bottle = 0.0000 AP for bus = 0.0000 AP for car = 0.0000 AP for cat = 0.0000 AP for chair = 0.0000 AP for cow = 0.0000 AP for diningtable = 0.0000 AP for dog = 0.0000 AP for horse = 0.0000 AP for motorbike = 0.0000 AP for person = 0.0000 AP for pottedplant = 0.0000 AP for sheep = 0.0000 AP for sofa = 0.0000 AP for train = 0.0000 AP for tvmonitor = 0.0000 Mean AP = 0.0000
Results: 0.000 0.000 0.000 0.000 ... ...
Let me know what am I doing wrong