vision Are new models planned to be added?

🚀 Feature

Adding new models to the models section.

Motivation

Many new models have been proposed in the recent years and do not exist in the models module. For example, the EfficientNets appear to provide with 8 models of different complexities that outperform everything else that exists at each complexity level.

Pitch

See Contributing to Torchvision - Models for guidance on adding new models.

[x] RetinaNet #1697
[x] Mobile Net v3 #3252
[x] Mobile Net Backbones #2700.
[x] Mobile Net Backbones for Detection #1999
[x] Mobile Net Backbones for Segmenetation #3276
[x] Single Shot Multi-Box (SSD) Detector #3760 #3403
[x] SSD Lite #3757
[x] Efficient Net (b0 to b7) #980
[x] RegNet #2655.
[x] ViT #4594
[x] FCOS #4961
[x] ConvNeXt #5197 #5253
[x] EfficientNetV2 #5450
[x] Swin Transformer #5491
[x] Improved MViT #6198
[ ] Swin Transformer V2 - #6242 #6246
[ ] DeTR #5922
[ ] DINO
[ ] Deformable DeTR
[ ] EfficientDet
[ ] YOLO #2074
[ ] Cascade RCNN
[ ] HTC
[ ] CondInst
[ ] SOLO
[ ] DeepLabv3+ With Resnet #2689
[ ] SE-ResNet and SE-ResNeXt #2179
[ ] Inception-ResNet #3899 and V2 #5036
[ ] NFNets
[ ] ResNeSt
[ ] ReXNet
[ ] FBNet
[ ] CoAtNet

Add pre-trained weights for the following variants:

[x] Pretrained weights for ShuffleNetv2 1.5x and 2.0x + the Quantized versions. #5906 #3257
[x] Pretrained weights for MNasnet 0_75 and 1_3. #3722 #6019
[x] Variant + Pretrained weights for Resnext101_64x4d depth. #5935 #3485
[x] Variant + Pretrained weights for Resnext152_32x4d depth. #3485

Sep 24 '20 20:09 talcs

This request has come often. Just linking all those for reference.

archived - update issue instead

[x] RetinaNet #1697
[x] Mobile Net v3 #3252
[x] Mobile Net Backbones #2700.
[x] Mobile Net Backbones for Detection #1999
[x] Mobile Net Backbones for Segmenetation #3276
[x] Single Shot Multi-Box (SSD) Detector #3760 #3403
[x] SSD Lite #3757
[ ] DeepLabv3+ With Resnet #2689 . Also, this had a discussion about Xception.
[ ] Pretrained weights for ShuffleNetv2 1.5x and 2.0x depth. #3257
[ ] MNasnet weights for 0_75 and 1_3 models. #3722
[ ] RegNet #2655.
[ ] SE-ResNet and SE-ResNeXt #2179
[ ] ResNest with ResNest FPN option for object detection.
[ ] Resnext101_64x4d depth. #3485
[ ] Resnext152_32x4d depth. #3485
[ ] Efficient Net (b0 to b7) #980 (Perhaps v2 models?)
[ ] EfficientDet
[ ] ReXNet
[ ] DeiT
[ ] DeTR
[ ] Inception-ResNet #3899

Edit by @datumbox: I shamelessly edited your comment and moved your fantastic up-to-date list on the issue for greater visibility.

Reply by @oke-aditya: I was actually going to suggest to do the same :smiley:

A generalized guideline for adding models is being added in contributing.md file in this pr #2663.

Sep 25 '20 02:09 oke-aditya

Hi,

To complement @oke-aditya great answer, we will be adding more models to torchvision, including Efficient Nets and MobileNetV3.

The current limitation is that we would like to ensure that we can reproduce the pretrained model using the training scripts from references/classification, but those models require a different training recipe than then one present in [references/classification`](https://github.com/pytorch/vision/tree/master/references/classification), so we will need to update those recipes before uploading those new models.

Sep 25 '20 15:09 fmassa

I hope to add Mish activation function.

Jan 11 '21 08:01 songyuc

@songyuc There is a closed feature request on PyTorch for adding Mish. You can comment over there for increased visibility so that Mish can be considered to be added in the future. Link to the issue - https://github.com/pytorch/pytorch/issues/25584

Jan 19 '21 04:01 digantamisra98

first, thanks for your great works. I hope to add Swish activation and NFNets(High-Performance Large-Scale Image Recognition Without Normalization) https://arxiv.org/abs/2102.06171. In addition, I would like to ask when eficientnet can be added. I found that it was mentioned in 2019, but now it's 2021. I refer to the mobilenetV3 model in torchvision, then I built efficientnet models Test9_efficientNet, but I don't have a GPU to train with.

Feb 24 '21 02:02 WZMIAOMIAO

Hi @WZMIAOMIAO Swish Activation function is added in to PyTorch (not torchvision) as nn.Silu. Mobilenetv3 would be hopefully available in next release.

Feb 24 '21 03:02 oke-aditya

@oke-aditya Thank you for your reply. I've seen MobileNetv3 in the torchvision repository. When will EfficientNet, RegNet and NFNet be added?

Feb 24 '21 07:02 WZMIAOMIAO

Hey guys, I was wondering if the pytorch team are open for public contributions to these models? 🤔 I assume we can follow similar PR formats to the one here and here along with validation/proof that we can reproduce paper results.

Oct 06 '21 17:10 stanwinata

@stwinata Thanks for offering. Which models do you have in mind to contribute?

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

Oct 06 '21 17:10 datumbox

@stwinata Thanks for offering. Which models do you have in mind to contribute?

@datumbox thanks for the quick reply! I am interested in DETR or EfficientDet. I was thinking for first commit maybe DETR might be easier, since we can use DETR's original repo for referene and may be able try to load weights for preliminary validations.

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

Perhaps we can also try to determine a canonical pipeline for model contribution through this experience and document it S.T others can contribute in the future easily 😃 !

Oct 06 '21 18:10 stanwinata

(mainly due to the training bit)

@datumbox Does this come down to lack of GPU resources? Or is it due to the need to validate that it can properly train?

Oct 06 '21 18:10 stanwinata

@stwinata DETR sounds a good addition to me. Since @fmassa is one of the main authors, I will let him have the final say on this.

Contributing models is tricky because:

To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.
On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.
The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

Happy to discuss more and see if it's worth doing this now.

Oct 06 '21 19:10 datumbox

@datumbox These comments makes sense 😃

To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.

On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.

Yeah I agree, some might even say getting it to models to be "useful" aka reproducing the Paper results are the fun bits 😃 I think in the future model contributions/PRs should include:

Implementation
Saved weights
Proof of Paper's Benchmark reproduction
documentations

I think this way, we can ease the load on Pytorch/Vision maintainers, make PRs much more concrete and useful.

Perhaps we can also have a simple util script that tests trained candidate implementations on various benchmarks.(this might be another feature request 😄 )

The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

I also agree with this. Moreover, I think these days GPU-resources either at home, or thru AWS and GCP are getting ubiquitous enough for contributors to do training by themselves 😃

Oct 06 '21 19:10 stanwinata

@stwinata Thanks for the comments. I think we agree. Below I write few thoughts on the potential process we could adopt.

The minimum to merge such a contribution is:

The PR must include the code implementation, have documentation and tests.
It should also extend the existing reference scripts used to train the model.
The weights need to reproduce closely the results of the paper in terms of accuracy.
The PR description should include commands/configuration used to train the model, so that we can easily run them on our infra to verify.

Note that there are details here related to the code quality etc, but these are rules that apply in all PRs.

For someone who would be interested in adding a model, here are a few important considerations:

Training big models requires lots of resources and the cost quickly adds up.
Reproducing models is fun but also risky as you might not always get the results reported on the paper. It might require a huge amount of effort to close the gap.
The contribution might not get merged if we significantly lack in terms of accuracy, speed etc.

The above are a very big ask I think. But if an OSS contributor is willing to give it a try despite the above adversities, then we would be happy to pair up and help. This should happen in a coordinated way to:

Ensure that the model in question is of interest and that nobody else is already working on adding it.
Ensure there is an assigned maintainer providing support, guidance and regular feedback.

@fmassa let me know your thoughts on this as well.

Oct 07 '21 08:10 datumbox

I am aming at adding FCOS to torchvision. https://github.com/xiaohu2015/vision/blob/main/torchvision/models/detection/fcos.py

Nov 16 '21 13:11 xiaohu2015

@xiaohu2015 Nice work, have you managed to reproduce the accuracies of the paper?

@fmassa Any thoughts on FCOS?

Nov 16 '21 14:11 datumbox

@datumbox Yes, I am working to implement it and reproduce the peformance. But I think that some time is need.

Nov 17 '21 01:11 xiaohu2015

I think FCOS would be a good addition. It is one of the top methods in https://paperswithcode.com/methods/category/object-detection-models that we don't yet have in torchvision, and @xiaohu2015 implementation seems very nice.

Nov 17 '21 10:11 fmassa

@datumbox @fmassa Hi, we have pulled the FCOS code (https://github.com/pytorch/vision/pull/4961), Could you review it and give some advice?

Nov 19 '21 13:11 xiaohu2015

Would it be possible to have also grey-scale ImageNet weights for the usual models along the lines described in Xie & Richmond?

Xie, Y. and Richmond, D., “Pre-training on grayscale ImageNet improves medical image classification,” in [Proceedings of the European Conference on Computer Vision (ECCV) Workshops], 476–484, Springer (September 2019).

Dec 27 '21 19:12 aisosalo

Another CNN: https://github.com/facebookresearch/deit/blob/main/patchconvnet_models.py

Jan 18 '22 01:01 xiaohu2015

In part informed by the discussions in this ticket I am proposing a new model contribution guidelines here. Your feedback/suggestions would be very valuable.

Jan 31 '22 12:01 jdsgomes

Hi there,

I'm doing few-shot classification and similarity learning, and currently dino deit backbone is a top-performing one on my datasets. Can we add it to torchvision.models? I'm willing to submit a PR with some guidance.

Mar 17 '22 06:03 Rusteam

Hi there,

I'm doing few-shot classification and similarity learning, and currently dino deit backbone is a top-performing one on my datasets. Can we add it to torchvision.models? I'm willing to submit a PR with some guidance.

If you only want to add the pretrained weights, I think it is very easy, as torchvision support mult-weights.

Mar 17 '22 08:03 xiaohu2015

@Rusteam Thanks for the proposal.

Our Model contribution guidelines are still a work in progress. One of the things we need to figure out is how do we deal with contributions that are produced without our reference scripts. Right now, we require all of the weights to be reproducible with our references. There is one exception to the above rule and this is when we port weights directly from a paper. Usually this is not the preferable solution either and we typically do it when we want to offer the architecture but we don't have the time to train the network from scratch (or it's too costly to do so).

Given that your proposal doesn't fall on the above exception, we would have to be able to reproduce our weight training with our scripts. Unfortunately that's going to be tricky because TorchVision doesn't support a training script for few-shot learning. We have a similarity reference script which hasn't really been maintained much. It's within our plans to improve support on the future but currently you wouldn't be able to train the proposed models using our scripts.

cc @yoshitomo-matsubara because there were some discussions of adding better support of distillation in TorchVision.

Mar 17 '22 10:03 datumbox

well yeah i guess you're right. Before you have some kind of contrib section, I can use hub to use that model.

Mar 17 '22 12:03 Rusteam

Not exactly contrib section. But you could create a repo and a hubconf.py file. So that the model is accessible by torch.hub

Mar 17 '22 12:03 oke-aditya

Please pin this issue.

Mar 18 '22 15:03 talregev

@datumbox

Mar 18 '22 15:03 talregev

I want to add another object detection model: ATSS.

Apr 28 '22 11:04 xiaohu2015

vision vision copied to clipboard

Are new models planned to be added?

🚀 Feature

Motivation

Pitch

vision
vision copied to clipboard