tf-faster-rcnn icon indicating copy to clipboard operation
tf-faster-rcnn copied to clipboard

How to train Faster R-CNN on my own dataset?

Open tp227 opened this issue 7 years ago • 35 comments

Hi,every: I want to train Faster R-CNN on my own dataset,and this dataset has only two classes. I do not know how to change the code.Could you give me a favor for the change details? Think you very much!

tp227 avatar May 10 '17 14:05 tp227

You can refer to #17

philokey avatar May 11 '17 05:05 philokey

@philokey You mean "add"? Like this: self._classes = ('background', # always index 0 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor','0','1')

or "replace"? Like this: self._classes = ('background', '0','1')

I have done like "replace",and got wrong with "invalid input arguments". Is there wrong?

tp227 avatar May 11 '17 05:05 tp227

@tp227 You should use "replace". Can you paste the error details? By the way, do you check the format of input data?

philokey avatar May 11 '17 06:05 philokey

@philokey you are so niece! The e-mail failed to sent , so I paste the content as flowing:

When I run "./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16", I got the flowing :

“InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4096,3] rhs shape= [4096,21] [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@vgg_16/cls_score/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](vgg_16/cls_score/weights, save/RestoreV2_3/_7)]]”

My datasets have annotated with pascal_voc format. The output with VOC2007 file have replaced the original VOC2007 file. The name of each picture is like 10001000.jpg, meaning that the targets of this picture are "1" ,"0" ,"0" ,"0" ,"1" ,"0", "0" &"0".

In addition, I use labelImg to label the targets. The output XML('<' before each line has been deleted because of the wrong format) like this:

annotation verified="no"> folder>JPEGImages filename>00010001.jpg # path>/home/robot/tf-faster-rcnn-master/VOCdevkit/VOC2007/JPEGImages/00010001.jpg source> database>Unknown /source> size> width>1200 height>900 depth>3 /size> segmented>0 object> name>no pose>Unspecified truncated>0 Difficult>0</Difficult> bndbox> xmin>306 ymin>72 xmax>379 ymax>801 /bndbox>

Where is wrong?

And maybe the config.py and vgg16.py should be update? How?

tp227 avatar May 11 '17 07:05 tp227

@tp227 Have you solve this problem? Can you give some guide for what should be change?

xuewenyuan avatar May 15 '17 08:05 xuewenyuan

@xuewenyuan sorry,not yet.

tp227 avatar May 15 '17 08:05 tp227

@endernewton Could you please give us an instruction about training on custom dataset? We met some errors, but still cannot figure them out. Thanks!

xuewenyuan avatar May 16 '17 03:05 xuewenyuan

I'm wondering if the pascal_voc (class parts) is only one file which needs to be modified for trainning own dataset.

paulcx avatar May 17 '17 01:05 paulcx

@paulcx I have change pascal_voc.py. It did not work.

xuewenyuan avatar May 17 '17 07:05 xuewenyuan

From my experience, there only one place in pascal_voc.py needs to be modified for just training. you would need to change similar place and create a different network architecture for demo or test.

paulcx avatar May 17 '17 20:05 paulcx

In pascal_voc.py, I changed the original classes into my classes. Is there anything which should be modified?

self._classes = ('__background__',  # always index 0
                  'aeroplane', 'bicycle', 'bird', 'boat',
                  'bottle', 'bus', 'car', 'cat', 'chair',
                  'cow', 'diningtable', 'dog', 'horse',
                  'motorbike', 'person', 'pottedplant',
                  'sheep', 'sofa', 'train', 'tvmonitor')
self._classes = ('__background__',  # always index 0
                      'figure', 'formula', 'table')

xuewenyuan avatar May 18 '17 02:05 xuewenyuan

@xuewenyuan i may do that in some time, but maybe in the mean time this is a great practice for you to set it up on new datasets.

endernewton avatar May 18 '17 10:05 endernewton

@xuewenyuan u also need to check the annotation of your dataset is 0-based or 1-based. I suggest u can compare voc with ilsvrc and will know which part is need to modify.

XiongweiWu avatar May 18 '17 12:05 XiongweiWu

@endernewton @XiongweiWu Thanks. This work is a little difficult for me now. But I will try.

xuewenyuan avatar May 19 '17 01:05 xuewenyuan

@philokey @endernewton @XiongweiWu I modified my dataset format, and the above error has disappeared. But I got the result as flowing:

4952 validation roidb entries Traceback (most recent call last): File "./tools/trainval_net.py", line 164, in max_iters=args.max_iters) File "/home/robot/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 371, in train_net roidb = filter_roidb(roidb) File "/home/robot/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 360, in filter_roidb filtered_roidb = [entry for entry in roidb if is_valid(entry)] File "/home/robot/tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 349, in is_valid overlaps = entry['max_overlaps'] KeyError: 'max_overlaps' Command exited with non-zero status 1 2.65user 0.08system 0:02.75elapsed 99%CPU (0avgtext+0avgdata 291384maxresident)k 0inputs+56outputs (0major+65477minor)pagefaults 0swaps

I don't know what's wrong about this. Maybe I didn't understand the 'max_overlaps' clearly. Could you please give me the exact explanation?

tp227 avatar May 19 '17 09:05 tp227

@tp227 I've recently loaded and ran my own dataset.

In my case I changed the _load_pascal_annotation in pascal_voc.py model file in order to read a simple text format instead of the xml format because it is quite verbose.

The function will return boxes, gt_classes, gt_overlaps, seg_areas.

gt_overlaps, seg_areas are calculated so the only thing you need to worry about is the boxes coordinates and classes.

The error you get above is in the data validation function which probably indicates that your data for training isn't correct. Maybe you have boxes that isn't inside of the image or they might be inverted to they don't have any area for instance if x1 > x2 a.s.o

kalaspuffar avatar May 24 '17 05:05 kalaspuffar

@tp227 I have the same error. How do you solve the problem? Thanks.

w39865008 avatar Jun 03 '17 01:06 w39865008

@tp227 @w39865008 I have successfully run this framework.
I download the repository to make all files are updated. tf-faster-rcnn/lib/datasets/pascal_voc.py is the only file I modified. I replaced self._classes with the classes of my data. There is no necessary to change the number of class in networks. The most important thing is to confirm your data have a correct format with pascal voc or coco. All my errors before are derived from this even if the difference of carriage return. If you are confident with a correct environment configuration, just check whether your data have a right format. And thanks @endernewton again for this excellent work.

xuewenyuan avatar Jun 12 '17 13:06 xuewenyuan

@xuewenyuan I have solve the error to clear the file “default” . When you modified files, file “default” and "cache" need to be cleared. Thanks to your answer.

w39865008 avatar Jun 12 '17 13:06 w39865008

I am getting an error ` Fix VGG16 layers.. Fixed. iter: 20 / 70000, total loss: 0.839219

rpn_loss_cls: 0.482955 rpn_loss_box: 0.259976 loss_cls: 0.071895 loss_box: 0.024393 lr: 0.001000 speed: 0.752s / iter iter: 40 / 70000, total loss: 0.875337 rpn_loss_cls: 0.423560 rpn_loss_box: 0.350305 loss_cls: 0.086583 loss_box: 0.014890 lr: 0.001000 speed: 0.618s / iter iter: 60 / 70000, total loss: 1.133546 rpn_loss_cls: 0.348662 rpn_loss_box: 0.595944 loss_cls: 0.135538 loss_box: 0.053402 lr: 0.001000 speed: 0.508s / iter iter: 80 / 70000, total loss: 0.629528 rpn_loss_cls: 0.403030 rpn_loss_box: 0.180778 loss_cls: 0.034033 loss_box: 0.011688 lr: 0.001000 speed: 0.455s / iter iter: 100 / 70000, total loss: 0.750905 rpn_loss_cls: 0.374346 rpn_loss_box: 0.299970 loss_cls: 0.076589 loss_box: 0.000000 lr: 0.001000 speed: 0.435s / iter /home/gabbar/ML/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:28: RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) iter: 120 / 70000, total loss: nan rpn_loss_cls: 0.681892 rpn_loss_box: nan loss_cls: 1.877977 loss_box: 0.000000 lr: 0.001000 speed: 0.416s / iter iter: 140 / 70000, total loss: nan rpn_loss_cls: 0.672521 rpn_loss_box: nan loss_cls: 1.581246 loss_box: 0.000000 lr: 0.001000 speed: 0.402s / iter`

I am getting nan on rpn loss box after the RunTime Warning.

abhiML avatar Jun 14 '17 13:06 abhiML

@kalaspuffar I see your approach in the #95 , Would you like to introduce your approach in detail? 1 How to modify the _load_pascal_annotation function in the pascal_voc.py ? 2 How to update the factory.py file to include your new dataset? 3 How to update the ./experiments/scripts/train_faster_rcnn.sh and ./experiments/scripts/test_faster_rcnn.sh to include your dataset? If you provide your code or a detailed guide, it's better. Thank you for your reply!

wmf1991yeah avatar Jul 18 '17 03:07 wmf1991yeah

trainset_path = '/bigdisk/.../paris_trainset.pkl'#added

testset_path = '/bigdisk/.../paris_testset.pkl'#added

dataset_path = '/bigdisk/.../paris/'#added

if not os.path.exists( trainset_path ) or not os.path.exists( testset_path ):

trainset_dir = os.path.join( dataset_path, 'train2014' )
testset_dir = os.path.join( dataset_path, 'val2014' )


trainset = pd.DataFrame({'image_path': map(lambda x: os.path.join( trainset_dir, x ), os.listdir(trainset_dir))})
testset = pd.DataFrame({'image_path': map(lambda x: os.path.join( testset_dir, x ), os.listdir(testset_dir))})

trainset.to_pickle( trainset_path )
testset.to_pickle( testset_path )

else: trainset = pd.read_pickle( trainset_path ) testset = pd.read_pickle( testset_path )

this code can help you build your own .pkl dataset from imgaes and read it if it has been built.

@kalaspuffarhttps://github.com/kalaspuffar I see your approach in the #95https://github.com/endernewton/tf-faster-rcnn/issues/95 , Would you like to introduce your approach in detail? 1 How to modify the _load_pascal_annotation function in the pascal_voc.py ? 2 How to update the factory.py file to include your new dataset? 3 How to update the ./experiments/scripts/train_faster_rcnn.sh and ./experiments/scripts/test_faster_rcnn.sh to include your dataset? If you provide your code or a detailed guide, it's better. Thank you for your reply!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/endernewton/tf-faster-rcnn/issues/85#issuecomment-315949425, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXuSQR3l8SW0xvXmY12v1-nNLSTZlAfyks5sPClegaJpZM4NWt_T.

JacobLee121 avatar Jul 19 '17 04:07 JacobLee121

@kalaspuffar Hi, can you give me some advises on how did you run on your own dataset? Do I need to download the pre-trained model and weights? I tried the steps the author offer but I got 0 AP. Can you give me some advices?

zdm123 avatar Oct 02 '17 05:10 zdm123

@xuewenyuan Hi, I changed my dataset as the same format with PASCAL VOC, but I got 0 AP? I want to use the code for logo detection, do I still need to download the pre-trained model? Can you please give me some advices?

zdm123 avatar Oct 02 '17 05:10 zdm123

@zdm123 hi,I also have this problem,are you solve it,if you did,please help me,thank you so much!

zqdeepbluesky avatar Jan 12 '18 16:01 zqdeepbluesky

In demo.py, line 144, modify the '21' to len(CLASSES).

Run demo.py again. it's worked. :D

f27ny105t5123 avatar Jan 25 '18 07:01 f27ny105t5123

It is perhaps due to the errors of "bbox" coodinates ( x < 0 or x > img_width ) in your Annotations. (At least for my case)

Site1997 avatar Apr 01 '18 09:04 Site1997

When I run pascal_voc.py with only self.classes changed to the classes of my custom dataset, I get the following error:

Traceback (most recent call last): File "./tools/trainval_net.py", line 105, in imdb, roidb = combined_roidb(args.imdb_name) File "./tools/trainval_net.py", line 76, in combined_roidb roidbs = [get_roidb(s) for s in imdb_names.split('+')] File "./tools/trainval_net.py", line 76, in roidbs = [get_roidb(s) for s in imdb_names.split('+')] File "./tools/trainval_net.py", line 69, in get_roidb imdb = get_imdb(imdb_name) File "/hdd2/k21993/tf-faster-rcnn-visdrone/tools/../lib/datasets/factory.py", line 55, in get_imdb return __setsname File "/hdd2/k21993/tf-faster-rcnn-visdrone/tools/../lib/datasets/factory.py", line 30, in __sets[name] = (lambda split=split, year=year: visdrone(split, year)) File "/hdd2/k21993/tf-faster-rcnn-visdrone/tools/../lib/datasets/visdrone.py", line 44, in init self.classes = ('background''ignored regions', 'pedestrian', 'people', 'bicycle','car', 'van', 'truck', 'tricycle', 'awning-tricycle','bus', 'motor', 'others') AttributeError: can't set attribute Command exited with non-zero status 1 4.21user 3.18system 0:03.68elapsed 200%CPU (0avgtext+0avgdata 241028maxresident)k 8inputs+24outputs (0major+84043minor)pagefaults 0swaps

Can some one please help me regarding this issue?

Karthik-Suresh93 avatar May 28 '18 19:05 Karthik-Suresh93

1.I have successfully run that demo,but the terminal shows the following: CUDA driver version is insufficient for CUDA runtime version What should I do? 2. When I run: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 There is a error to me like this : Traceback (most recent call last): File "./tools/trainval_net.py", line 97, in cfg_from_list(args.set_cfgs) File "/home/lc/Desktop/Han/API/tf-faster-rcnn-master/tools/../lib/model/config.py", line 386, in cfg_from_list type(value), type(d[subkey])) AssertionError: type <class 'int'> does not match original type <class 'list'> Command exited with non-zero status 1 1.63user 0.36system 0:01.74elapsed 114%CPU (0avgtext+0avgdata 270900maxresident)k 0inputs+8outputs (0major+81874minor)pagefaults 0swaps

Could you tell me the details of solving this question? thanks very much!

ChengxiHAN avatar Oct 08 '18 14:10 ChengxiHAN

please check the variable self.classes if it contains any mistake.

rnsandeep avatar Oct 08 '18 16:10 rnsandeep

In pascal_voc.py, I changed the original classes into my classes. Is there anything which should be modified?

self._classes = ('__background__',  # always index 0
                  'aeroplane', 'bicycle', 'bird', 'boat',
                  'bottle', 'bus', 'car', 'cat', 'chair',
                  'cow', 'diningtable', 'dog', 'horse',
                  'motorbike', 'person', 'pottedplant',
                  'sheep', 'sofa', 'train', 'tvmonitor')
self._classes = ('__background__',  # always index 0
                      'figure', 'formula', 'table')

Have you figure out this problem? I encountered the same problem.But I train my own data set before,modify some files(this file included) it worked,now I train my another data set again,it does not work.

Emma-uestc avatar May 07 '19 05:05 Emma-uestc

@Emma-uestc try to clear the file “default” and "cache".

xuewenyuan avatar May 07 '19 09:05 xuewenyuan

@Emma-uestc try to clear the file “default” and "cache".

I did clear the cache and output directory each time.Do you have any idear for use the GPU which does not been included in the GPU list above in this project,How to set the gpu-arch

Emma-uestc avatar May 07 '19 10:05 Emma-uestc

I am getting an error ` Fix VGG16 layers.. Fixed. iter: 20 / 70000, total loss: 0.839219

rpn_loss_cls: 0.482955 rpn_loss_box: 0.259976 loss_cls: 0.071895 loss_box: 0.024393 lr: 0.001000 speed: 0.752s / iter iter: 40 / 70000, total loss: 0.875337 rpn_loss_cls: 0.423560 rpn_loss_box: 0.350305 loss_cls: 0.086583 loss_box: 0.014890 lr: 0.001000 speed: 0.618s / iter iter: 60 / 70000, total loss: 1.133546 rpn_loss_cls: 0.348662 rpn_loss_box: 0.595944 loss_cls: 0.135538 loss_box: 0.053402 lr: 0.001000 speed: 0.508s / iter iter: 80 / 70000, total loss: 0.629528 rpn_loss_cls: 0.403030 rpn_loss_box: 0.180778 loss_cls: 0.034033 loss_box: 0.011688 lr: 0.001000 speed: 0.455s / iter iter: 100 / 70000, total loss: 0.750905 rpn_loss_cls: 0.374346 rpn_loss_box: 0.299970 loss_cls: 0.076589 loss_box: 0.000000 lr: 0.001000 speed: 0.435s / iter /home/gabbar/ML/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:28: RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) iter: 120 / 70000, total loss: nan rpn_loss_cls: 0.681892 rpn_loss_box: nan loss_cls: 1.877977 loss_box: 0.000000 lr: 0.001000 speed: 0.416s / iter iter: 140 / 70000, total loss: nan rpn_loss_cls: 0.672521 rpn_loss_box: nan loss_cls: 1.581246 loss_box: 0.000000 lr: 0.001000 speed: 0.402s / iter`

I am getting nan on rpn loss box after the RunTime Warning.

@abhiML hi, I got the same problem. How did you fix this problem? I'm struggling with tons of Nan...

ZhuanShan avatar Jun 19 '19 10:06 ZhuanShan

I am following the steps from the repository webpage, and ran the default test script, but I get zeros everywhere.

GPU_ID=0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101

... ... Reading annotation for 4950/4952 Reading annotation for 4951/4952 Reading annotation for 4952/4952 Saving cached annotations to /home/sagarwal/tf-faster-rcnn/data/VOCdevkit2007/annotations_cache/test_annots.pkl AP for aeroplane = 0.0000 AP for bicycle = 0.0000 AP for bird = 0.0000 AP for boat = 0.0000 AP for bottle = 0.0000 AP for bus = 0.0000 AP for car = 0.0000 AP for cat = 0.0000 AP for chair = 0.0000 AP for cow = 0.0000 AP for diningtable = 0.0000 AP for dog = 0.0000 AP for horse = 0.0000 AP for motorbike = 0.0000 AP for person = 0.0000 AP for pottedplant = 0.0000 AP for sheep = 0.0000 AP for sofa = 0.0000 AP for train = 0.0000 AP for tvmonitor = 0.0000 Mean AP = 0.0000

Results: 0.000 0.000 0.000 0.000 ... ...

Let me know what am I doing wrong

shankar-agarwal avatar Nov 29 '19 08:11 shankar-agarwal