py-faster-rcnn Large images with small objects

I am training to use faster rcnn on my own dataset. After changing pascal_voc.py, factory.py and the models to use the right amount of classes, caffe doesn't give any errors anymore and starts training. It however doesn't seem to learn to anything.

In my dataset I have a small number of high resolution images, with a large number of objects per image (the images are 5000x5000 pixels and the bounding boxes are roughly 25x25).

Does the implementation resize the original images? If yes, where in the code can I find this? If no, do you have other suggestions to what might go wrong?

Feb 23 '16 13:02 Randdigit

Apparently images are rescaled to 600x1000 (or 1000x600), see /lib/rcnn/config.py. Using bigger values (that still fit on my GPU) didn't solve the problem though.

Feb 24 '16 10:02 Randdigit

@Randdigit Did you solve your issue ?

May 19 '16 08:05 Austriker

You have to change the sizes of the anchors so that the RPN learns proposals of the sizes of objects that you care about.

May 20 '16 22:05 Supersak80

@Supersak80 this solves the rescale to 600x1000 issue ? In the demo the anchors don't have the same size. Do you have to take the size of the biggest object you want to detect ?

May 23 '16 09:05 Austriker

@Austriker no, this doesn't solve the rescaling of the images. That can be changed via the appropriate parameters described in config.py. Read the paper for the sizes of default anchors that they use in the code. You basically have to setup the scales and ratios of your anchors to capture all sizes of objects that you wish to detect. Look at generate_anchors.py to get a sense of how the authors do this.

May 24 '16 06:05 Supersak80

@Supersak80 The size of my training pictures are all around 220 pixels width and length, there is only one object in the training pictures with around 150 pixel size. I want to detect the 20 - 80 pixel objects, should I take the parameters anchors generate_anchors(base_size=16, ratios=[0.5, 1, 2],scales=2**np.arange(1,4)) to fit into scale around 32 pixel , 64 pixel and 128 pixels . And also resize to

Each scale is the pixel size of an image's shortest side __C.TRAIN.SCALES = (200,)

Max pixel size of the longest side of a scaled input image __C.TRAIN.MAX_SIZE = 224

Jul 06 '16 16:07 helxsz

I have almost the same issue. The difference is that I have trained with my own dataset from Imagenet. The size of objects in the dataset is usually very large in comparison to the image. The detection is working fine on new images when running py-faster-rcnn (demo.py), however when the object is too small it is not detected. How can I fix this issue? Is there any way to detect smaller objects without modifying the dataset? Edit: for instance the images are 900x600. If the object is big enough (around 300x150) it is detected, but when it is small (around 70x70) nothing is detected.

Nov 28 '16 01:11 paulomarcos

@paulomarcos @Supersak80 @helxsz @Austriker @Randdigit I am also trying using (5616x3744,3) images as input size with very small objects. The images are RGB (5616x3744,3) with object sizes of around (20x10x3). Before changing anchor size I am experiencing another problem. I do not have GPU memory problem, but I get the following error:

F1214 18:32:32.298799 27802 blob.cpp:33] 
Check failed: shape[i] <= 2147483647 / count_ (3744 vs. 663) 
blob size exceeds INT_MAX

After searching through the internet, I noticed the blob format should be changed to avoid INT_MAX problem. My images are big (5kx3k). I do not have GPU problem. How I can solve this problem? In Faster RCNN they are using python input layer and not HDF5 layer. So why this problem happen?

I am using rgb implementation. First I can not change the size of images as the objects are very small and resizing would have negative effect. I can not also split the image as the Ground truth has been provided for the original image and I thought it would be less efficient to split the image as I would miss some objects at the splitting border. I had already changed the size of images in the following lines from (600,1000) to (5616,3744):

# Each scale is the pixel size of an image's shortest side
__C.TRAIN.SCALES = (5616,)
# Max pixel size of the longest side of a scaled input image
__C.TRAIN.MAX_SIZE = 3744

When training the network, I can see the original image is being read:

I1214 18:32:32.067646 27802 net.cpp:150] Setting up input-data
I1214 18:32:32.067694 27802 net.cpp:157] Top shape: 1 3 5616 3744 (63078912)
I1214 18:32:32.067699 27802 net.cpp:157] Top shape: 1 3 (3)
I1214 18:32:32.067703 27802 net.cpp:157] Top shape: 1 4 (4)

I think if I can be able to increase the blob size (above INT_MAX), it would be the most efficient way to do so. I have also mentioned this problem in the following post: Blob size exceeds INT_MAX #3084

Dec 14 '16 18:12 arasharchor

Why not scan the original large-size images with sliding windos(e.g. 600x600)?Then you can get the fitness training images.

Feb 14 '17 09:02 Solomon1588

@Solomon1588 That is an option, but actually not an efficient one. If I want to chop the images into smaller patches (which I am doing now :) ), I would only use half of GPU memory which I have (~ 6 GB).

I would like to use all of my GPU computation power and not use it partially.

Mar 07 '17 19:03 arasharchor

@smajida Hi, can you provide some code or infos how you stitched the parts with the detections back together?

Mar 30 '17 07:03 Z0org

@smajida Hi! Have you solved the large imagery object detection problem without rescale the image? How to solve this problem?

Jun 20 '17 15:06 whuhxb

Hi, I had the same problem and those are my conclusion at this point :

To me, the best answer was to cut the images in smaller patches, at least for the training phase. According to hardware requirement, you need :

3GB GPU memory for ZF net 8GB GPU memory for VGG-16 net

That's taking into account the 600x1000 original scaling, so to make it simple you need 8GB for 600 000 pixels assuming that you use VGG. I have 12GB on my GPU so if this is linear, i can go up to (600 000x12)/8 = 900 000 pixels maximum. I couldn't resize my images because my objects are small and I couldn't afford losing resolution. I chose to cut my 3000x4000 images in 750x1000 patches, which is the simplest division to go under 900 000 pixels. SCALES: [750] MAX_SIZE: 1000

However, the good thing is that you only need to cut the images for the training phase. Then you can apply the trained network on full images thanks the the separate test parameters : TEST: SCALES: [3000] MAX_SIZE: 4000

At least that's what I did and now I have a network working on 3000x4000 images to detect 100x100 objects, in full c++ thanks to the c++ version.

Jun 20 '17 15:06 RomainDWipsea

Hi! Thank you for your reply. I have 4 12g GPU cards, Can I use them together? Do you have any experience? In fact, my training datasets have 10 classes objects on 650 images, with the size ~400* ~1000, test images are 4000~7000 with only several images. What is the image number of your dataset? And about how many categories. Thanks!

At 2017-06-20 23:49:03, "Romain Dambreville" [email protected] wrote:

Hi, I had the same problem and those are my conclusion at this point :

To me, the best answer was to cut the images in smaller patches, at least for the training phase. According to hardware requirement, you need :

3GB GPU memory for ZF net 8GB GPU memory for VGG-16 net

That's taking into account the 600x1000 original scaling, so to make it simple you need 8GB for 600 000 pixels assuming that you use VGG. I have 12GB on my GPU so if this is linear, i can go up to (600 000x12)/8 = 900 000 pixels maximum. I couldn't resize my images because my objects are small and I couldn't afford losing resolution. I chose to cut my 3000x4000 images in 750x1000 patches, which is the simplest division to go under 900 000 pixels. SCALES: [750] MAX_SIZE: 1000

However, the good thing is that you only need to cut the images for the training phase. Then you can apply trained network on full images thanks the the separate test parameters : TEST: SCALES: [3000] MAX_SIZE: 4000

At least that's what I did and now I have a network working on 3000x4000 images to detect 100x100 objects, in full c++ thanks to the c++ version.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Jun 20 '17 16:06 whuhxb

@smajida Hello!! Have you solved the large imagery object detection problem without rescale the image?

I set this code: __C.TRAIN.SCALES = (600,) __C.TRAIN.MAX_SIZE = 1000

and change it: __C.TRAIN.SCALES = (3744,) __C.TRAIN.MAX_SIZE = 5616

But I think the Faster-rcnn learned nothing, the results are bad. The method is only cutting it into small patch? If I want to train the large image directly, is there some good ideal?

Jun 26 '17 12:06 xinboli

@xinboli @Randdigit @smajida Have you solved the large imagery object detection problem without rescale the image? I can't detect small object in my own picture? can you give me a hand ?

Sep 20 '17 02:09 ghost

@xinboli which dataset you are using?

Sep 21 '17 00:09 arasharchor

i change it: __C.TRAIN.SCALES = (1080,) __C.TRAIN.MAX_SIZE = 1920 how i change generate_anchors.py?

Dec 12 '17 02:12 gentlebreeze1

i change it: __C.TRAIN.SCALES = (1080,) __C.TRAIN.MAX_SIZE = 1920 how i change generate_anchors.py? @Supersak80

Dec 12 '17 02:12 gentlebreeze1

Has anyone considered a means of retrieving the small objects from cutting of pixels by somehow assigning the small objects a negative value and then, well working from there up until captured?

Jan 03 '18 10:01 Brad-Robbins

你们讨论了半天，这是讨论的啥呀，一点实质性帮助都没有。

Jun 19 '19 08:06 qishuaizheng

您这位神触来给点高见呗

-------- 原始信息 --------由： qishuaizheng [email protected] 日期: 2019/6/19 16:53 (GMT+08:00) 收件人： rbgirshick/py-faster-rcnn [email protected] 抄送： Subscribed [email protected] 主题： Re: [rbgirshick/py-faster-rcnn] Large images with small objects (#86) 你们讨论了半天，这是讨论的啥呀，一点实质性帮助都没有。

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

[

{

"@context": "http://schema.org",

"@type": "EmailMessage",

"potentialAction": {

"@type": "ViewAction",

"target": "https://github.com/rbgirshick/py-faster-rcnn/issues/86?email_source=notifications\u0026email_token=AFU5ZW6OCLK5YRMH63YJ3NDP3HXX5A5CNFSM4B4CPXW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYBFRWY#issuecomment-503470299",

"url": "https://github.com/rbgirshick/py-faster-rcnn/issues/86?email_source=notifications\u0026email_token=AFU5ZW6OCLK5YRMH63YJ3NDP3HXX5A5CNFSM4B4CPXW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYBFRWY#issuecomment-503470299",

"name": "View Issue"

},

"description": "View this Issue on GitHub",

"publisher": {

"@type": "Organization",

"name": "GitHub",

"url": "https://github.com"

}

]

Jun 19 '19 11:06 Jacob-Bian

py-faster-rcnn py-faster-rcnn copied to clipboard

Large images with small objects

py-faster-rcnn
py-faster-rcnn copied to clipboard