CrossStagePartialNetworks icon indicating copy to clipboard operation
CrossStagePartialNetworks copied to clipboard

Mosaic Augmentation Paper?

Open glenn-jocher opened this issue 4 years ago • 6 comments

@WongKinYiu @AlexeyAB great work on the README, there's a wealth of information here! I saw that the mosaic augmentation outperformed all of the rest in your readme tests, including more well known ones like cutmix.

Based on this I wonder if it's worth publishing a short paper to arxiv, and then we can link to this github, alexeyab/darknet and ultralytics/yolov3 for the mosaic dataloader code?

glenn-jocher avatar Jan 20 '20 20:01 glenn-jocher

@glenn-jocher

For your reference. image

WongKinYiu avatar Jan 20 '20 22:01 WongKinYiu

@glenn-jocher

I saw that the mosaic augmentation outperformed all of the rest in your readme tests, including more well known ones like cutmix.

Yes. It seems: more parts of different images - higher accuracy. So may be we can combine more than 4 images into one: 6, 8, 12 or 16.

Based on this I wonder if it's worth publishing a short paper to arxiv, and then we can link to this github, alexeyab/darknet and ultralytics/yolov3 for the mosaic dataloader code?

Yes. It would also be interesting to know what effect this gives when training the detector.

AlexeyAB avatar Jan 20 '20 23:01 AlexeyAB

@WongKinYiu wow, that's odd that Swish hurt the results while Mish helped a lot. Are you sure your Swish implementation was correct? Mish actually provided the largest improvement in your test, it's too bad it's so slow.

@AlexeyAB yes I was thinking of increasing the 4 images to 9 (in a 3x3 grid), but there are some practical problems with overlap, how to align and build the 3x3 mosaic etc, wheras now the mosaic is very simple, since there is no overlap, and the images all share a common vertex at the center. There would also be diminishing returns for having to load double the images, as currently the grey area is only maybe 10-20% of the image space. train_batch0

glenn-jocher avatar Jan 21 '20 00:01 glenn-jocher

@glenn-jocher

Are you sure your Swish implementation was correct?

Yes, Swish implementation is correct, since Swish improves accuracy for other models: EfficientNet, MixNet, PeleeNet, CSPPeleeNet, ... https://github.com/AlexeyAB/darknet/issues/3994#issuecomment-565692356

but there are some practical problems with overlap, how to align and build the 3x3 mosaic etc

Yes, there are problems for Detector, but there is no problem for Classifier to do 3x3 instead of 2x2: https://github.com/AlexeyAB/darknet/issues/4432

AlexeyAB avatar Jan 21 '20 00:01 AlexeyAB

@AlexeyAB ah yes, that's true. For classification tasks like imagenet the 3x3 mosaic is much simpler :)

glenn-jocher avatar Jan 21 '20 01:01 glenn-jocher

@glenn-jocher @AlexeyAB

https://arxiv.org/pdf/2004.12432.pdf mosaic-like paper.

WongKinYiu avatar Apr 28 '20 12:04 WongKinYiu