darknet Can Yolo3 take different width-height-ratio images as training input?

Images from VOC or some other datasets does not share exactly the same width-height ratio. For example, in VOC2012 some images are 334x500, some are 500x332, some are 486x500. In KITTY dataset, the width is always roughly 3 times of the height (1200x300).

I don't see any fully connected layers in yolo3. Does it mean that yolo3 can take different width-height-ration images as yolo3's training input?

Or do I need to crop images to the same size or apply SPP-Net technique to yolo3 before training. If SPP-Net is needed, before which yolo3 layer shall I apply the SPP-Net?

May 18 '18 18:05 jiqiyang

Yolo v1/v2/v3 can take different width/height/ratio of images as training/validation/test input.
Fully connected layers doesn't make network invariant to aspect-ratio. Fully connected layers only increase receptive field of each of final activation to the full-image-size. But in the Yolo v3 each final activation in the first [yolo] layer already has large receptive field.

May 18 '18 18:05 AlexeyAB

Thank you, @AlexeyAB .

Question 1

I also notice that there is a method of configuration on "dim" in darknet/src/detector.c#L87.

just change these 2 lines to: https://github.com/AlexeyAB/darknet/blob/5bc62b14e06a3fcfda4e3a19fba77589920eddee/src/detector.c#L87

    args.w = dim*2;    
    resize_network(nets + i, dim*2, dim);

(https://groups.google.com/forum/#!topic/darknet/HrkhOhxCgLk).

If Yolo v1/v2/v3 can take different width/height/ratio of images as training/validation/test input, then what is the point of configuring something like "dim*2"?

Does it mean that I should just keep it as the original when I am combining different width/height/ratio images as my training data?

args.w = dim;
args.h = dim;

for(i = 0; i < ngpus; ++i){
                resize_network(nets[i], dim, dim);
            }

Question 2

I am also confused by another thing. The instruction from Google groups mention that the "detector.c" should be in the src folder (https://github.com/AlexeyAB/darknet/tree/5bc62b14e06a3fcfda4e3a19fba77589920eddee/src), however I can only find "detector.c" from the examples folder (https://github.com/pjreddie/darknet/tree/master/examples). Should I just leave the detector.c in the examples folder if I am using pjreddie's yolo3 repo (https://github.com/pjreddie/darknet)?

May 18 '18 21:05 jiqiyang

Very simply put, Yolo can take different width/height/ratio of images as input data. But the more width/height/ratio different (in training and testing datasets) - the worse it detect. To avoid this, there is data augmentation:

jitter - randomly resizes image
random - randomly resizes network resize_network(nets[i], dim, dim);

Joseph moved it from src to the example folder, so just keep it there.

May 18 '18 22:05 AlexeyAB

@AlexeyAB Hi AlexeyAB, does it means that i do not have to resize my images as the network has jitter - randomly resizes image random - randomly resizes network resize_network(nets[i], dim, dim);

I am planning to train images on drones using YOLOv3, would want to ask if resizing the image would help the detector become more accurate. If so, what size would be the recommended size.

Appreciate ur help thank you!

Mar 04 '19 11:03 danieltwx

@danieltwx Hi, you shouldn't resize images.

Mar 04 '19 11:03 AlexeyAB

@AlexeyAB I have trained the network with cfg [net]: width=416, height=416, training dataset provided to network are of different different sizes as suggested by everyone to not resize images while training, network will do itself. (training done with loss=0.5)

while prediction some of the images are in ratio of 1:3, will model work for it, or do i need to resize to 1:1?

Appreciate ur help thank you!

Apr 02 '19 19:04 amankumarjain

@AlexeyAB Hi Alexey,

I'm training YOLOv3 on a dataset with just 1 object to be detected and classified per image (classes=4). The object is a rectangle that takes 80-95% of the image space almost always (it is a business card). The ratio of the images is 1:1.5 approx.

Given that the borders of the object are very close to the limits of the image (sometimes even touching them), I've set width=640, height=416 in my .cfg file, for the moment.

Is it safe to set both width and height at 416 as recommended? Or I'm risking losing valuable information due to the closeness of the object to the image limits?

Thanks for your great contribution and support to the community!

May 08 '19 15:05 MurreyCode

Hello, I'm beginning in AI. currently, I import Yolov3 to onnx, so could anyone please share me the sample .cpp file to import yolov3.

Thanks

Mar 03 '20 09:03 hoaquocphan

@MurreyCode you don't need to adjust height and width differently in your config or resize your database images. YOLO architecture does it by itself keeping the aspect ratio safe (no information will be ignored) according to the resolution in .cfg file. For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into a 416 x 416 network.

Apr 06 '20 05:04 pullmyleg

@MurreyCode you don't need to adjust height and width differently in your config or resize your database images. YOLO architecture does it by itself keeping the aspect ratio safe (no information will be ignored) according to the resolution in .cfg file. For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into a 416 x 416 network.

for which version ?

Sep 11 '20 08:09 erolgerceker

Hi Alexey, Great work on yolo! Can i train my yolov5 for custom image size i.e. 720x720 or higher ? If yes do i need to train for more number if steps or as per classes i can stick to the steps calculated as in config file ?

Looking forward for your answer & help on this. Regards, Mahesh

Mar 03 '21 09:03 maheshmechengg

@maheshmechengg you can increase resolution as much as you like as long as it’s divisible by 32.

But you will need to decrease your batch size. As you increase your image training resolution the images take up more memory in your GPU so you need to decrease your batch size to allow them to fit on the GPU memory.

Decreasing batch size does slowly decrease accuracy. But in my experience a higher resolution (to an extent) and a decreased batch size results in better accuracy.

You adjust batch size by increasing subdivisions in your config as per the instructions for out of memory issue if or when these arise as you increase your resolution.

Number of steps or iterations does not need to increase in addition with a resolution increase.

Mar 03 '21 09:03 pullmyleg

@maheshmechengg you can increase resolution as much as you like as long as it’s divisible by 32.

But you will need to decrease your batch size. As you increase your image training resolution the images take up more memory in your GPU so you need to decrease your batch size to allow them to fit on the GPU memory.

Decreasing batch size does slowly decrease accuracy. But in my experience a higher resolution (to an extent) and a decreased batch size results in better accuracy.

You adjust batch size by increasing subdivisions in your config as per the instructions for out of memory issue if or when these arise as you increase your resolution.

Number of steps or iterations does not need to increase in addition with a resolution increase.

yes thanks, i did the same way as you said.

Mar 03 '21 10:03 maheshmechengg

Hi Alexey,

I have a question about .cfg file of YoloV3. How does changing width or height effect the model? Isn't it taking fixed shape images as input? When I increase the height and the width while testing the models performance it increases the detection score and decrease the fps and frankly I couldn't find the reason.

Thanks in advance.

Nov 17 '21 15:11 sekomer

Hi Alexey,

I have a question about .cfg file of YoloV3. How does changing width or height effect the model? Isn't it taking fixed shape images as input? When I increase the height and the width while testing the models performance it increases the detection score and decrease the fps and frankly I couldn't find the reason.

Thanks in advance.

@Sekomer increasing the height and width increases the amount of pixels the model can use to detect objects.

More pixels equates to better accuracy because there is more detail in the image for the model to utilise.

An image sized 100x100 px has far less detail then an image with 1000x1000px.

It runs slower because the model needs to scan across more pixels (residual blocks).

Suggest reading this: https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/

Nov 17 '21 18:11 pullmyleg

More pixels equates to better accuracy because there is more detail in the image for the model to utilise.

An image sized 100x100 px has far less detail then an image with 1000x1000px.

It runs slower because the model needs to scan across more pixels (residual blocks).

Suggest reading this: https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/

First, thanks for your answer.

We're on the same page as what you said, but what I don't understand is what changes when I double the h and w values in the cfg file during testing. Is it splitting the image into 4 subimages and iterating over them or doing some other black magic? I want to learn this.

Nov 18 '21 11:11 sekomer

@AlexeyAB

Hi people. I'm using yolov4 to train 5K images 3180 x 2160 for object detection, 1 class. The training seems to complete successfully ([email protected] = 98%) but the problem comes to inference some images. Training charts also look OK

chart_yolov4-custom

The problem is that when I conduct inference, the predicted bounding box is shifted from the real object. Shifted around 100 pixels x 100 pixels in X, Y, respectively. The object seems to be recognized but the BB is not located exactly in the right position.

I have in my cyolov4-custom.cfg: batch=64 subdivisions=32 width=416 height=416

Do you think this shift could be because my training images are non-squared (3180 x 2160 - 1:1.7 proportion), while and weight and height values in the .cfg are 1:1 proportion (416x416). Could this mistmatch be the responsible for such shift in the predicted bounding box?

Please any light or hints to clarify this would be extremely helpful, thanks Take care

Jun 02 '22 12:06 vongracia

Hi @vongracia

Few things:

416x416 is quite a low resolution.
Use a resolution that is in a similar aspect ratio. e.g. 640 x 480 (or larger if you want to increase mAP). Note resolution in your config must be divisible by 32.
In your config underneath the 'scales' line add: letter_box=1 this will maintain the aspect ratio of your images during training and likely solve your issue.

If the above doesn't solve your issue (it should) you can do the below to increase bounding box tightness:

to make the detected bounded boxes more accurate, you can add 3 parameters ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou to each [yolo] layer and train, it will increase [email protected], but decrease [email protected].

Jun 02 '22 21:06 pullmyleg

Hi @pullmyleg @AlexeyAB

Thanks a lot for the answer. I've tried the things that you proposed and there is no apparent change in the inference. Still mismatch. See next. predictions

Captura de pantalla de 2022-06-15 09-50-27

This is the training chart.

chart_yolov4-custom

The .cgf file (for training) look like this (where you see I changed width and height to be multiple of 32, and in proportion to the non-square training images: 3180x2160; and adding letter_box=1; ALSO I added: ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou and still no appreciable change....):

Captura de pantalla de 2022-06-15 09-37-00

The cfg file (for inference) is the same but batch=1, subdivisions=1.

These are the metrics for the trained model:

Captura de pantalla de 2022-06-15 09-41-27

Could you people give me more hints? I do not really understand what is going on here... metrics do not look bad, isn't it?

Thanks in advance!

Jun 15 '22 07:06 vongracia

@vongracia

Are you using this repo? You should be using: https://github.com/AlexeyAB/darknet
Use a tiny yolo model. You can have a bigger batch size and training will be faster with the above repo. https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4-tiny-custom.cfg
What size image are you running inference on? The large images?
Can you show your data and labels in yolo_label or similar?
You should be training for 5000 - 10000 iterations.

Jun 15 '22 23:06 pullmyleg

@AlexeyAB

Hi people. I'm using yolov4 to train 5K images 3180 x 2160 for object detection, 1 class. The training seems to complete successfully ([email protected] = 98%) but the problem comes to inference some images. Training charts also look OK

The problem is that when I conduct inference, the predicted bounding box is shifted from the real object. Shifted around 100 pixels x 100 pixels in X, Y, respectively. The object seems to be recognized but the BB is not located exactly in the right position.

I have in my cyolov4-custom.cfg: batch=64 subdivisions=32 width=416 height=416

Do you think this shift could be because my training images are non-squared (3180 x 2160 - 1:1.7 proportion), while and weight and height values in the .cfg are 1:1 proportion (416x416). Could this mistmatch be the responsible for such shift in the predicted bounding box?

Please any light or hints to clarify this would be extremely helpful, thanks Take care

Can you publish the cfg files or tell how to train 360 degree camera

3180 x 2160 pixel images thanks in advance

Sep 06 '22 13:09 saktheeswaranswan

@saktheeswaranswan Are you using this repo? You should be using: https://github.com/AlexeyAB/darknet

Sep 06 '22 20:09 pullmyleg

@vongracia in this case , i recommend instance segmentation networks.

Sep 14 '22 09:09 roxroxroxrox

darknet darknet copied to clipboard

Can Yolo3 take different width-height-ratio images as training input?

Question 1

Question 2

darknet
darknet copied to clipboard