vision [RFC] Batteries Included

🚀 The feature

Note: To track the progress of the project check out this board.

This is the 3rd phase of TorchVision's modernization project (see phase 1 and 2). We aim to keep TorchVision relevant by ensuring it provides off-the-shelf all the necessary primitives, model architectures and recipe utilities to produce SOTA results for the supported Computer Vision tasks.

1. New Primitives

To enable our users to reproduce the latest state-of-the-art research we will enhance TorchVision with the following data augmentations, layers, losses and other operators:

Data Augmentations

[ ] AutoAugment for Detection [1, 2] - #6224 #6609
[ ] Mosaic [1, 2] - #6534
[ ] Mixup for Detection [1, 2]

Losses

[ ] Dice Loss [1, 2] - #6435
[ ] Poly Loss [1, 2] - #6439 #6457

Operators added in PyTorch Core

[ ] LARS Optimizer [1, 2]
[ ] LAMB Optimizer [1, 2]
[x] Polynomial LR Scheduler [1, 2] - code - https://github.com/pytorch/pytorch/pull/82769

2. New Architectures & Model Iterations

To ensure that our users have access to the most popular SOTA models, we will add the following architectures along with pre-trained weights:

Image Classification

[x] Swin Transformer V2 - #6242 #6246
[ ] MobileViT v1 & v2 [1, 2] - #6404
[ ] MaxViT - #6342

Video Classification

[x] MViTv2 [1] - #6373
[ ] Swin3d [1] - #6499 #6521
[x] S3D [1] - #6402 #6412 #6537

3. Improved Training Recipes & Pre-trained models

To ensure that are users can have access to strong baselines and SOTA weights, we will improve our training recipes to incorporate the newly released primitives and offer improved pre-trained models:

Reference Scripts

[ ] Update the Reference Scripts to use the latest primitives - #6405

Pre-trained weights

[ ] Improve the accuracy of Video models

Other Candidates

There are several other Operators (#5414), Losses (#2980), Augmentations (#3817) and Models (#2707) proposed by the community. Here are some potential candidates that we could implement depending on bandwidth. Contributions are welcome for any of the below:

YOLOX [1] - #6341
DeTR - #5922
U-Net - #6610 #6611
MViTv2 for Images [1]
Video Transformer Network [1]
MTV
Deformable DeTR
Shortcut Regularizer (FX-based)

cc @datumbox @vfdev-5

Jul 27 '22 14:07 datumbox

Tagging a few of the regular contributors in case they are interested in specific items: @abhi-glitchhg @federicopozzi33 @frgfm @lezwon @oke-aditya @xiaohu2015 @yassineAlouini @zhiqwang

Feel free to propose additional candidates.

Jul 27 '22 17:07 datumbox

I will be happy to take losses :) dice loss first.

Jul 27 '22 18:07 oke-aditya

Like Christmas in july hehe :grin:

I have a few questions though:

for losses, I already have implemented the poly loss on my end. Do you want us to do the Python implementation or also the C++ / CUDA binding? (I saw that there are some in PyTorch core, so I'm not sure what's the target here)
about optimizers, I also have implementations for LARS & LAMB, should we open PR directly on core? or do we need to contact them on a dedicated issue beforehand?
about models, happy to go for the implementation but I don't have the gear to train it on Imagenet with the usual procedure. Do we need to train them as well?

Looking forward to help for the next release :)

Jul 27 '22 21:07 frgfm

I'd like to take the Polynomial LR scheduler!

about schedulers, should we open PR directly on core? or do we need to contact them on a dedicated issue beforehand?

Jul 28 '22 04:07 federicopozzi33

Thanks for offering to help! We are lucky to have you guys! :)

@frgfm excellent questions. Let me try to provide some more context here:

Losses are going to be added for now in TorchVision with a Python only implementation. Ideally we should reuse as much as possible Core's existing methods that have C++/CUDA bindings. That's particularly true for the PolyLoss where we want to reuse Core's cross_entropy rather than rewriting it with pure tensor op. Unless I missed a development on Core (in which case please correct me and provide reference) neither Poly nor Dice are planned to be added.
For the 2 optimizers and 1 scheduler, the PR should come directly on Core but we will provide help to maximize the changes of getting this landed. Due to the nature of the PR, it has higher risk of not getting merged but I've already spoke with some Core devs about it and I'm hopeful we can get it in.
For the models the plan is to use the process of our new model contribution guideline. The TLDR is that yes we want the PR to contain weights that prove at least a tiny variant of the model works as expected but then we will help you train them by running the e2e training on our internal infra.

Jul 28 '22 09:07 datumbox

Hello all 👋 Bit late to the party!

Is mosaic still available? If not I would like to try it.

Aug 03 '22 08:08 abhi-glitchhg

@abhi-glitchhg Mosaic is available. It's a bit unclear how it will be implemented at the moment as there are multiple approaches seen online. I would prefer it if we can implement it as a Transform (rather than a Dataset or preloader etc), potentially similar to what we do for MixUp or SimpleCopyPaste. I think it would be best to disconnect its addition from the Transforms V2 initiative and add it first on the references. Then @pmeier and @vfdev-5 can propose moving forward with it using the new API.

To test the new transform we can use a similar approach as in #5825. The contributor has provided enough visual proof that the transform works as expected and then I helped him verify it by training models on internal infra. Let me know if that makes sense to you.

Aug 03 '22 10:08 datumbox

BTW @lezwon just let us know he is busy and thus AutoAugment for Detection is also up for grabs if someone wants it. See #6224

Aug 03 '22 10:08 datumbox

@datumbox I can work on MobileViT. 👌

Aug 06 '22 16:08 yassineAlouini

I wanted to know how helpful it is to implement network architectures without training them, validating the implementations just by using/adapting/porting weights released by the authors.

I can provide some specific examples if needed.

Aug 09 '22 07:08 federicopozzi33

@federicopozzi33 Though the final training (especially of the large variants) is done by our team, we typically request to train at least one variant of the architecture to prove it works. The hardest part of such contributions is often to reproduce the accuracies of the paper and that's why we request this. We've been known to be flexible though, especially if a contributor has experience in implementing and contributing similar architectures to us. Another approach would be to partner with another contributor who has access to an infra and co-author the PR. That's the approach taken on the FCOS model by @xiaohu2015 and @zhiqwang.

Aug 09 '22 07:08 datumbox

@yassineAlouini I realized that my fat finger gave you a thumbs down instead of thumbs up on your comment to work on MobileViT. Sorry about that. Are you still interested in it?

Aug 10 '22 14:08 datumbox

@datumbox Yes I am and I understood that you meant :+1: instead so all is good. I should start on Friday. :ok_hand:

Aug 10 '22 14:08 yassineAlouini

I'd like to take on LARS optimizer, but I have a question: to test the correctness of the optimizer, is it required to reproduce the experiments of the paper?

Aug 11 '22 08:08 federicopozzi33

@federicopozzi33 I don't think it's required to reproduce experiments but we would need to be very careful to ensure the optimizer works the same as a reference implementation. If reproducing experiments is necessary, I can run them for you.

@frgfm you said you had already implementations, are you still interested and have the time to contribute? Perhaps you could work with Federico.

Let me know your preferences guys and we can come up with a plan. Because the contribution will land on Core, we would need to align with their practices. The earlier PR on PolynomialLR went super smoothly, so we can try replicating that approach.

Aug 11 '22 08:08 datumbox

@federicopozzi33 I don't think it's required to reproduce experiments but we would need to be very careful to ensure the optimizer works the same as a reference implementation. If reproducing experiments is necessary, I can run them for you.

Ok. There should be some reference implementations.

@frgfm you said you had already implementations, are you still interested and have the time to contribute? Perhaps you could work with Federico.

Let me know your preferences guys and we can come up with a plan. Because the contribution will land on Core, we would need to align with their practices. The earlier PR on PolynomialLR went super smoothly, so we can try replicating that approach.

Oh sorry, I looked at the main thread, without paying attention to the other messages.

@frgfm let me know if you're still interested to contribute, I can choose other issues without any problem :)

Aug 11 '22 09:08 federicopozzi33

@federicopozzi33 @frgfm @yassineAlouini @abhi-glitchhg @oke-aditya It would be great if you can either open issues with the items you plan to work on, or open dummy (empty) initial PRs with them so that we can link from the ticket and know which work is assigned to whom.

This would allow other community members to pick up work. I would also recommend to assign one task to each so that we can progress the work faster and without blocking others who want to contribute (though I'm happy to group together things that make sense such as the losses or the optimizers if that's what we want).

Aug 11 '22 10:08 datumbox

Will do @datumbox :ok_hand:

Aug 11 '22 12:08 yassineAlouini

https://github.com/pytorch/vision/issues/6404 @datumbox not sure if it is the proper way to do it (since it is the first time for me), please comment/enhance if you have some time. Will start exploring the code soon.

Aug 12 '22 05:08 yassineAlouini

Sorry everyone, I was away from computers for a few days :sweat_smile:

@frgfm you said you had already implementations, are you still interested and have the time to contribute? Perhaps you could work with Federico.

Sure!

@federicopozzi33 @frgfm @yassineAlouini @abhi-glitchhg @oke-aditya It would be great if you can either open issues with the items you plan to work on, or open dummy (empty) initial PRs with them so that we can link from the ticket and know which work is assigned to whom.

That's a good idea, do you prefer issues that you can add as "tasks" in this very issue?

Aug 16 '22 19:08 frgfm

That's a good idea, do you prefer issues that you can add as "tasks" in this very issue?

Sure. I just want to make sure to have an owner for each algorithm and know which ones are available. Please create issues for those that you would like to tackle and assign yourself. If something changes and you can't pick them up, please let me know if try to find a different owner :)

Aug 16 '22 20:08 datumbox

Alright, I'll start with polyloss then & open an issue for this. One question then: should I create a "nn" module on torchvision for this? Or will losses go into "ops"? Or another submodule for experimental features ("prototype/nn" ?) that will move later on to upstream core?

Then I'll go for the LARS & LAMB optimizers if they aren't taken care of by then :+1:

And hopefully, there will still be some models left to check afterwards :)

Aug 17 '22 17:08 frgfm

I too believe nn module or having sub module for layers and losses would be great. But that would be BC breaking etc.

Aug 17 '22 17:08 oke-aditya

@oke-aditya @frgfm Sorry I missed your messages.

We should add the losses flat on the ops, similar to what we do with every other loss. The nn module was too controversial and wasn't adopted.

Working on LARS/LAMB would also be awesome. We would need to coordinate with Core, but I think we can do it. Looking forward to your PRs!

Aug 19 '22 14:08 datumbox

We should add the losses flat on the ops, similar to what we do with every other loss. The nn module was too controversial and wasn't adopted.

Alright, but so far the other losses don't provide the module interface (only the functional API). I'll start with functional, and let me know in the PR if it's worth adding the module version as well :+1:

Working on LARS/LAMB would also be awesome. We would need to coordinate with Core, but I think we can do it. Looking forward to your PRs!

Fortunately, I have already followed the same syntax & folder organization as core. So the integration should be rather smooth :ok_hand:

Aug 20 '22 10:08 frgfm

I have talked with @oke-aditya in slack and he seems interested to add swin3d and I will help to support him on this. cc @datumbox

Aug 24 '22 16:08 YosuaMichael

Yes. Fortunately I'm well and having good health. So will take dice loss and this. : 😊

Aug 24 '22 16:08 oke-aditya

Thanks a lot @oke-aditya for the help!

Aug 24 '22 17:08 YosuaMichael

I see that Mixup for Detection [1, 2] is still available. Can I pick it up?

Oct 07 '22 08:10 ambujpawar

@ambujpawar It is! Would you be happy to give a try of the new Transforms API (it's at torchvision.prototype.transforms) or you prefer to stick with hacking together an implementation based on what we have on the references of classification?

Oct 07 '22 09:10 datumbox

[RFC] Batteries Included - Phase 3

🚀 The feature

1. New Primitives

Data Augmentations

Losses

Operators added in PyTorch Core

2. New Architectures & Model Iterations

Image Classification

Video Classification

3. Improved Training Recipes & Pre-trained models

Reference Scripts

Pre-trained weights

Other Candidates