vision icon indicating copy to clipboard operation
vision copied to clipboard

YOLO

Open senarvi opened this issue 2 years ago • 7 comments

A generic YOLO implementation that supports the most important features of YOLOv3, YOLOv4, YOLOv5, YOLOv7, Scaled-YOLOv4, and YOLOX. It includes networks that have been written in PyTorch, but the user can also load a network from a Darknet configuration file. The features such as matching predictions to targets have been implemented in a modular way, so that they can easily be replaced, or reused in different models. Target class labels may be specified as a matrix of class probabilities, allowing multi-label classification. Includes unit tests and complete type hints for static type checking.

This code is contributed with the premission of my employer Groke Technologies.

Fixes #6341

senarvi avatar Apr 04 '23 13:04 senarvi

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7496

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Apr 04 '23 13:04 pytorch-bot[bot]

cc @NicolasHug

oke-aditya avatar Apr 05 '23 05:04 oke-aditya

Seems like I was able to fix most of the unit tests. The biggest job was to get the TorchScript compilation working. In order to fix it, I had to make the code a bit uglier in some places. For example, the target matching classes cannot be subclasses, because the JIT compilation can't handle subclasses. Also, the iou function and the cross-entropy function cannot be passed as function objects, which makes the loss computation a bit awkward. I don't know if there would be some way to make function objects work.

senarvi avatar Apr 20 '23 16:04 senarvi

I'm attempting to test these models by training them on the COCO dataset, however I am unable to get close to the results reported in the original papers, for example yolo v7 here:

https://github.com/WongKinYiu/yolov7?tab=readme-ov-file#performance

Wondering if I am doing something wrong in training these, or if you can provide any guidance on what you are doing to train them and the results you are seeing?

patches11 avatar May 01 '24 21:05 patches11

Hi @patches11 . I haven't trained models on YOLO recently. Also, I don't have access to a compute cluster for training these models anymore. I did check earlier that I can use YOLOv4 weights, so the forward pass should be correct. But there are lots of details used in model training, like mosaic and copy-paste augmentation. I feel like all the details are not mentioned in the papers. I'm not even sure how exactly the models were tested (what data and resolution were used). I also found gradient clipping to be important, even though it's not used in the papers. Maybe @FateScript can comment what we're still missing?

senarvi avatar May 02 '24 17:05 senarvi

@senarvi thanks for the details, I will take a look at implementing those augmentations.

It does definitely seem like we don't get all the details in the papers

patches11 avatar May 03 '24 15:05 patches11

Yeah, it will be interesting if we can get all the augmentations and training tricks exactly as in the papers. That way we could get a fair comparison between YOLO and other architectures. I think YOLOv7 uses 1280x1280 input size, which consists of four 640x640 tiles. So for each network input you sample four images, which makes it a bit more complicated. For the copy-paste augmentation we need segmentation masks, in addition to the bounding boxes. I think during testing they sort the images so that for each batch you get as similar sizes as possible, so that you don't have to resize the images as much. Still, if I'm right, the test results vary a little bit, depending on the batch size.

senarvi avatar May 06 '24 19:05 senarvi