vision icon indicating copy to clipboard operation
vision copied to clipboard

[FEEDBACK] Multi-weight support prototype API

Open datumbox opened this issue 2 years ago • 39 comments

🚀 Feedback Request

This issue is dedicated for collecting community feedback on the Multi-weight support API. Please review the dedicated article where we describe the API in detail and provide an overview of its features.

We would love to get your thoughts, comments and input in order to finalize the API and include it on the new release of TorchVision.

cc @oke-aditya @frgfm @zhiqwang @xiaohu2015

datumbox avatar Dec 11 '21 09:12 datumbox

I was looking at the example in the article and the one thing that jumped out at me as a potential source of confusion (especially for newcomers) was weights.transforms(). Although weights is just the variable name here, and I understand the need to associate the specific model('s weights) and its transform. I think maybe what I don't like is using the word "weights" for the whole "model" abstraction. One potential way to avoid confusion would be to be more specific. so "PM.ResNet50.ImageNet1K_V1" would be the models interface(?) and 'weights','transforms','meta' would be accessed from there like;

resnet = PM.ResNet50.ImageNet1K_V1
weights = resnet.weights
preprocess = resnet.transforms()
class_ids = resnet.meta["categories"]

And the example would turn into

from torchvision.prototype import models as PM

# Step 1: Initialize model
resnet = PM.ResNet50.ImageNet1K_V1 #Less verbose than ResNet50_Weights also stops the ResNet50_weights confusion for those not using autocomplete IDEs
model = PM.resnet50(weights=resnet.weights) # or maybe just resnet
model.eval()

# Step 2: Initialize the inference transforms
preprocess = resnet.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
prediction = model(batch).squeeze(0).softmax(0)

# Step 4: Use the model and print the predicted category
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = resnet.meta["categories"][class_id] # I like meta attributes
print(f"{category_name}: {100 * score}*%*")

Of course all this is a matter of style/preference, and there might be reasons it was implemented that way. I haven't looked at the code.

Overall I really like this feature, it will surely make life easier ❤️ no more copy pasting transforms, hurray! 🥳

cceyda avatar Dec 23 '21 09:12 cceyda

@cceyda Thanks for taking the time to provide detailed feedback. 😄

Originally we considered going with a proposal similar to what you suggested as it would lead to less verbose names. Unfortunately the name ResNet50 is too close to the name ResNet which is used for the main class of the model. This could confuse users to believe that it's an nn.Module model rather than an Enum. Other options were also considered such as introducing new model interfaces that bundle everything together but unfortunately this approach wouldn't be compatible with the existing model builder methods of TorchVision which we had to maintain for backwards compatibility.

Concerning linking the transforms to the model VS the weights abstraction, one reason we selected the latter is to allow for extra flexibility. The same model variant can be used in different training recipes and produce weights that require different preprocessing transforms. A good real-world example of that in TorchVision is EfficientNet B1, which requires different preprocessing for v1 and v2.

For full transparency, the points you raised are valid and many of them were brought up during development (see discussions at #4937 and #4652). Unfortunately due to the constraints imposed there is no single perfect solution that leads to a concise, consistent, pythonic and Backwards Compatible API. Multiple good solutions exist and it depends on what one optimizes for. Let me know your thoughts.

datumbox avatar Dec 23 '21 16:12 datumbox

If multiple weights are available per model, it'd be nice if we could programmatically scan through all available weights and automatically associate them with their corresponding model function (ie: lowercase "resnet50"). It could be through a weights dictionary similar to the model_urls dictionary for instance, whose keys are exactly the name of the corresponding model's function.

On top of that, each model function should clearly document to the user what weights are available:

def resnet50(*, weights: Optional[ResNet50_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNet50_Weights.verify(weights)
    r"""ResNet-50
    Args:
        weights: value can be
            ResNet50_Weights.ImageNet1K_V1
            ResNet50_Weights.ImageNet1K_V2
    """

divideconcept avatar Jan 02 '22 14:01 divideconcept

@divideconcept Thanks a lot for your feedback.

Your recommendations are indeed spot on. It's very reassuring for us that you identify the same limitations as this is something we want to tackle in our 2022Q1 roadmap. More specifically:

  1. Introduce a model registration mechanism similar to the one used in the prototype dataset area. We see this as a separate RFC to this one, so I'll make sure to ping you early in the process to get your input.
  2. Improve the model documentation. @NicolasHug is already looking into ways to programmatically build the documentation directly from the meta-data and include more information such as model sizes etc. We also want to document better the training processes used for estimating the weights.

Let's stay in touch concerning the prototype work. Given your work on TorchStudio, it would be great to coordinate and get your input early.

datumbox avatar Jan 02 '22 14:01 datumbox

I think prototype API is wonderful design. Now that you add the data transform of eval mode in the weights, is it possble to include the data transform of train mode. The data transform of train mode is more complicated (eg. mixup, sampler, smmoth label), maybe it is troublesome.

xiaohu2015 avatar Jan 04 '22 11:01 xiaohu2015

@xiaohu2015 Thanks for the feedback!

You are right, currently we don't include the training transforms but instead provide a link to the training recipe. There was originally a lot of debate on whether we should include them and this is still an open question. For full transparency, I will list here the reasons why currently they are not included, so that the community is aware and can provide feedback:

  • Not all transforms used for training are currently part of TorchVision core library. Though we plan to port them on main library soon.
  • Some transforms, such as MixUp and CutMix, might be done on the DataLoader level via collate_fn. Thus they don't fit nicely the API.
  • Some transforms might require a dependency that TorchVision doesn't currently have. For example, let's say that a community member contributes a model that was trained using cv2 transforms.
  • The transforms alone are not enough to describe how the model was trained. As you said, sampler and other techniques affect the data that are actually fed to the model.

For now we opted in providing a reference to the training recipe, but I would love to hear your input on how we can boost reproducibility further.

datumbox avatar Jan 04 '22 11:01 datumbox

@datumbox For naming conventions, it would be better to stick to Python naming convention specifically for class name, which should be CapWords per PEP 8.

I found some new class names in new API contain _ such as model weight config classes e.g., Wide_ResNet50_2_Weights where I would name it either as WideResNet50K2Weights or WideResNet50WF2Weights as 2 comes from k: widening factor from the paper

I am also wondering why it has to be Enum. It seems that all the new WeightEnum classes come with some URL, so I feel HuggingFace hub style (input: URL, output: module e.g., transform, model weights, model config, etc) may be convenient for many PyTorch users.

yoshitomo-matsubara avatar Jan 18 '22 04:01 yoshitomo-matsubara

@yoshitomo-matsubara Thanks for the feedback.

For naming conventions, it would be better to stick to Python naming convention specifically for class name, which should be CapWords per PEP 8.

You are right. We are not happy with the current naming situation. This is why I was eager to discuss this here on Github, summarize the challenges and reasons we adopted the specific naming convention and, if possible, find alternatives.

The issue is that TorchVision has already extremely long model builder names that contain _. Here are a few problematic examples:

  • resnext101_32x8d: The underscore was added to separate the numbers and without it the name is unreadable.
  • shufflenet_v2_x1_5: The x1_5 represents the multiplier 1.5 with underscore replacing the dot.
  • keypointrcnn_resnet50_fpn: The underscore separates the different model architectures of the variant.

After lengthy offline and online discussions (see https://github.com/pytorch/vision/pull/4937#discussion_r749545547, https://github.com/pytorch/vision/issues/4652#issuecomment-946647712), the convention adopted was to take the Model builder name, capitalize it properly and append _Weights to it. We favoured this approach because we felt:

  1. Makes the existing long names more readable. Example: use FasterRCNN_ResNet50_FPN_Weights instead of FasterRCNNResNet50FPNWeights, to avoid cramping together model names that contain acronyms or packed encodings of hyper-parameters.
  2. Makes more apparent the link between the method name and it's weights. Example: use FasterRCNN_MobileNet_V3_Large_320_FPN_Weights instead of FRCNN_MNetV3L_320, which might be shorter but less clear. Also this allows us to automatically check the naming conventions in unit-tests.
  3. Clarifies that the Enums are Weights, not model classes. Example: use ResNet50_Weights instead of ResNet50 as the latter can be confused for an nn.Module class.

If you have any suggestions on how to improve the situation, I would really love to hear them!

I am also wondering why it has to be Enum.

First of all it is possible to pass strings to the model builder. So the following will work, though it's discouraged:

from torchvision.prototype import models as PM
model = PM.resnet50(weights="ResNet50_Weights.ImageNet1K_V2")
# or
model = PM.resnet50(weights="ImageNet1K_V2")

Why? Because now you don't have access to the weights transforms. So instead, if one wants to work with strings is advised to do:

from torchvision.prototype import models as PM
w = PM.get_weight("ResNet50_Weights.ImageNet1K_V2")
PM.resnet50(weights=w)
preprocess = w.transforms()
# ...

Enums allowed us to group together information related to the model weights in a single object and ensure its always available for us during the model construction. This allowed us to enforce schemas on the meta-data and retrieve key information (such as the backend and the non-quantized weights for the case of quantization, the num_classes for classification/detection/segmentation etc). Moreover note that TorchVision favours Enums to strings because of their better IDE integration, ability to do static analysis on the code and the reduced chance of typos.

I feel HuggingFace hub style (input: URL, output: module e.g., transform, model weights, model config, etc) may be convenient

For backwards compatibility reasons, the output of model builders must be the model, not a tuple of things. Moreover there are other limitations such as JIT-scriptability that we had to factor in. The above API choice does not stop us from offering a registration mechanism that gives a similar functionality on the future. It was actually within our plans to offer such a new API but decided to decouple it from this proposal to give more time for feedback. You can see an example of such a registration mechanism on the prototype dataset.

Let me know your thoughts.

datumbox avatar Jan 18 '22 10:01 datumbox

Thank you @datumbox for the clarification. Now the adoption of enum sounds more convincing to me.

In terms of grouping information by enums,

  1. I feel many of the long name issues are from modules using classifiers as backbones like detection and segmentation. What about using a hierarchical structure so that we could avoid redundancy and use of _ ?
  2. it seems that the attribute names don't have to be mixture of capital letter and _ but should be lowercased e.g., FasterRCNN_ResNet50_FPN_Weights.Coco_V1 -> FasterRCNN_ResNet50_FPN_Weights.coco_v1

e.g., Grouping weights of FasterRCNN series like this

class WeightsCollection:
# some new class (whose design is similar to WeightsEnum) to stores multiple weights 
# skip the detailed implementation here

class FasterRCNNWeights(WeightsEnum):
    resnet50_fpn_backbone = WeightsCollection(
        coco_v1=Weights(
            url="https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth",
            transforms=CocoEval,
            meta={
                **_COMMON_META,
                "num_params": 41755286,
                "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-resnet-50-fpn",
                "map": 37.0,
            },
        default='coco_v1'
        )
    )

    mobilenet_v3_large_fpn_backbone = WeightsCollection(
        coco_v1=Weights(
            url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth",
            transforms=CocoEval,
            meta={
                **_COMMON_META,
                "num_params": 19386354,
                "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-fpn",
                "map": 32.8,
            },
        default='coco_v1'
        )
    )

    mobilenet_v3_large_320_fpn_backbone = WeightsCollection(
        coco_v1=Weights(
            url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_320_fpn-907ea3f9.pth",
            transforms=CocoEval,
            meta={
                **_COMMON_META,
                "num_params": 19386354,
                "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-320-fpn",
                "map": 22.8,
            },
        default='coco_v1'
        )
    )

instead of

class FasterRCNN_ResNet50_FPN_Weights(WeightsEnum):
    Coco_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 41755286,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-resnet-50-fpn",
            "map": 37.0,
        },
    )
    default = Coco_V1


class FasterRCNN_MobileNet_V3_Large_FPN_Weights(WeightsEnum):
    Coco_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 19386354,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-fpn",
            "map": 32.8,
        },
    )
    default = Coco_V1


class FasterRCNN_MobileNet_V3_Large_320_FPN_Weights(WeightsEnum):
    Coco_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_320_fpn-907ea3f9.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 19386354,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-320-fpn",
            "map": 22.8,
        },
    )
    default = Coco_V1

What do you think?

yoshitomo-matsubara avatar Jan 18 '22 19:01 yoshitomo-matsubara

@yoshitomo-matsubara Thanks for the proposal and for including an example to clarify what you mean.

I feel many of the long name issues are from modules using classifiers as backbones like detection and segmentation. What about using a hierarchical structure so that we could avoid redundancy and use of _ ?

Yes that is correct, it comes from this existing convention of TorchVision. Using Hierarchical structures is a viable solution with the benefit that it groups together weights that are on the same family. I'm not sure it addresses the length aspect as FasterRCNNWeights.mobilenet_v3_large_fpn_backbone.coco_v1 is lengthier than the existing option. Also note that building such a solution would require much more code to handle the hierarchy. I think that the biggest drawback is that it eliminates the 1-1 mapping between a model builder and it's enum. This relationship is used for weight validation (static analysis + run-time verification), so effectively we would be disabling features from the existing API.

it seems that the attribute names don't have to be mixture of capital letter and _ but should be lowercased

Correct, the values of the enums don't have to be capitalized like this. Here are a few current examples of such values: CocoWithVocLabels_V1, Coco_V1, ImageNet1K_V1, ImageNet1K_FBGEMM_V1, Kinetics400_V1. Originally I've capitalized them like that to make values such as CocoWithVocLabels_V1 more readable, though admittedly the same can be achieved if we rename this to something like coco_voclabels_v1. If I was to change the capitalization, I would probably go with capital letters to make it consistent with the rest of TorchVision (example 1, 2, 3).

@NicolasHug @pmeier Any thoughts on changing the capitalization of the values of Weight Enums to all capitals?

datumbox avatar Jan 19 '22 09:01 datumbox

Given that PEP8 states:

Constants are [...] written in all capital letters with underscores separating words.

and enum fields are very much constants, I would prefer to also use this naming scheme for them.

pmeier avatar Jan 19 '22 09:01 pmeier

@pmeier

and enum fields are very much constants, I would prefer to also use this naming scheme for them.

I agree that it the enum fields like Coco_V1 are constants and should be all capital letters with underscores like COCO_V1

@datumbox

I'm not sure it addresses the length aspect as FasterRCNNWeights.mobilenet_v3_large_fpn_backbone.coco_v1 is lengthier than the existing option.

I think the main problem in the current naming is _ in class names and having hierarchical structure enables us to keep the class names short e.g., FasterRCNNWeights, and then lengthy parts will be field (attribute) names, thus we can use _ like FasterRCNNWeights.MOBILENET_V3_LARGE_FPN_BACKBONE.COCO_V1 following @pmeier 's suggestion :)

I think that the biggest drawback is that it eliminates the 1-1 mapping between a model builder and it's enum. This relationship is used for weight validation (static analysis + run-time verification), so effectively we would be disabling features from the existing API.

I might miss something there, but will the change look simple like below?

@handle_legacy_interface(weights=("pretrained", ResNetWeights.RESNET18.IMAGENET1K_V1))
def resnet18(*, weights: Optional[ResNetWeights.RESNET18] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNetWeights.RESNET18.verify(weights)

    return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)


@handle_legacy_interface(weights=("pretrained", ResNetWeights.RESNET34.IMAGENET1K_V1))
def resnet34(*, weights: Optional[ResNetWeights.RESNET34] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNetWeights.RESNET34.verify(weights)

    return _resnet(BasicBlock, [3, 4, 6, 3], weights, progress, **kwargs)

instead of

@handle_legacy_interface(weights=("pretrained", ResNet18_Weights.ImageNet1K_V1))
def resnet18(*, weights: Optional[ResNet18_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNet18_Weights.verify(weights)

    return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)


@handle_legacy_interface(weights=("pretrained", ResNet34_Weights.ImageNet1K_V1))
def resnet34(*, weights: Optional[ResNet34_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNet34_Weights.verify(weights)

    return _resnet(BasicBlock, [3, 4, 6, 3], weights, progress, **kwargs)

yoshitomo-matsubara avatar Jan 19 '22 17:01 yoshitomo-matsubara

@yoshitomo-matsubara

I might miss something there, but will the change look simple like below?

The current implementation uses the class information to verify that the provided weights is of the same type. See the code here.

Frankly I'm still not convinced that FasterRCNNWeights.MOBILENET_V3_LARGE_FPN_BACKBONE.COCO_V1 is much of an improvement from the existing one. It's longer and even if you remove the extra _BACKBONE, you still have a string of same size. Additionally this increases complexity and requires extra code to handle the hierarchy, which is unnecessary because one can't used the weights of one model builder to another from the same family.

I agree that it the enum fields like Coco_V1 are constants and should be all capital letters with underscores like COCO_V1

Sounds good. Happy to review a PR that makes the capitalization change of enum values across the prototype API.

datumbox avatar Jan 19 '22 19:01 datumbox

@datumbox

The current implementation uses the class information to verify that the provided weights is of the same type. See the code here.

Additionally this increases complexity and requires extra code to handle the hierarchy, which is unnecessary because one can't used the weights of one model builder to another from the same family.

Does it mean these two functions (verify and from_str) will not work with the hierarchical structure? If so, because ResNetWeights.RESNET18 in the above example is still WeightsEnum, the following one should work

weights = ResNetWeights.RESNET18.verify(weights)

https://github.com/pytorch/vision/blob/main/torchvision/prototype/models/_api.py#L50-L66


If the hierarchical structure is not something torchvision wants to offer, I came up with a new idea. Instead of having the hierarchical structure, what about introducing a weights module like torchvision.models.weights then having model_name.py such as faster_rcnn.py?

Then, torchvision/models/weights/faster_rcnn.py can have the current WeightsEnum style but without _ in the class name like

class MobileNetV3LargeFpnWeights(WeightsEnum)
    COCO_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 19386354,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-fpn",
            "map": 32.8,
        },
    )
    default = COCO_V1

While the class name itself has no link to FasterRCNN class, since it is placed under torchvision.models.weights.faster_rcnn i.e., torchvision.models.weights.faster_rcnn.MobileNetV3LargeFpnWeights, it would be obvious that it is for FasterRCNN class.

It may also shorten a python file per model compared to having the weight enum classes and model classes + functions in one file.


Sounds good. Happy to review a PR that makes the capitalization change of enum values across the prototype API.

Regardless of whether or not my ideas above are adopted, I'm happy to make the changes and send a PR :D


P.S.

As __members__ in enum is a mappingproxy instance, I think the following part can be more efficient by replacing

    @classmethod
    def from_str(cls, value: str) -> "WeightsEnum":
        for k, v in cls.__members__.items():
            if k == value:
                return v
        raise ValueError(f"Invalid value {value} for enum {cls.__name__}.")

with

    @classmethod
    def from_str(cls, value: str) -> "WeightsEnum":
        if value in cls.__members__:
            return cls.__members__[value]
        raise ValueError(f"Invalid value {value} for enum {cls.__name__}.")

I can send a separate PR for this if it looks good to you.

yoshitomo-matsubara avatar Jan 19 '22 20:01 yoshitomo-matsubara

@yoshitomo-matsubara Thanks for the ideas. I think for now we are can move forwards with the capitalization of the enum values. Good spot on the optimization of the from_str, a PR to fix is also welcome. The rest of the ideas require more careful thought.

Feel free to add me as reviewer on the PR so that we can discuss any potential name changes.

datumbox avatar Jan 19 '22 20:01 datumbox

We've merged the prototype API into main TorchVision. The target was to do this as early as possible before the next release to leave enough time for tests. Please keep sharing your feedback to help us iron out the rough edges. Issues/PRs are very welcome.

datumbox avatar Mar 22 '22 16:03 datumbox

Hi, This feature is very nice. But how to know training information through prototype, such as epoch, aug, etc. I think this information is directly tied to mAP.

class RetinaNet_ResNet50_FPN_Weights(WeightsEnum):
    COCO_V1 = Weights(
        url="https://download.pytorch.org/models/retinanet_resnet50_fpn_coco-eeacb38b.pth",
        transforms=ObjectDetection,
        meta={
            "task": "image_object_detection",
            "architecture": "RetinaNet",
            "publication_year": 2017,
            "num_params": 34014999,
            "categories": _COCO_CATEGORIES,
            "interpolation": InterpolationMode.BILINEAR,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#retinanet",
            "map": 36.4,
        },
    )
    DEFAULT = COCO_V1

Is it possible to attach a link to indicate these key pieces of information? Currently through https://pytorch.org/vision/stable/models.html#runtime-characteristics, I still cannot know the training strategy. Was it trained by this training strategy ? I am very confused.

hhaAndroid avatar Apr 05 '22 09:04 hhaAndroid

It's right there in your code snippet: "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#retinanet" The link has all the informations you need.

divideconcept avatar Apr 05 '22 09:04 divideconcept

you can find the training setting (traing command) in the url of recipe,

xiaohu2015 avatar Apr 05 '22 09:04 xiaohu2015

@hhaAndroid This is very fair feedback; we must fix our documentation.

The exact training details should be already there (else it's a bug and we need to fix ASAP) but it's a bit of a mess right now because we literally just released the Multi-weight support API. I confirm that what @divideconcept and @xiaohu2015 told you is correct. The recipe URL that exists in every weight meta-data should link to the place where we have all the details. Sometimes that's a dedicated doc page and sometimes it's an issue/PR.

In Q2, as part of our documentation revamp, we plan to create cleaner model documentation. @NicolasHug is already looking into this and has created some prototypes at #5577. There is a dedicated feedback issue for our documentation #5511.

datumbox avatar Apr 05 '22 09:04 datumbox

@datumbox OK. Thank you for your patient reply. I'm confused because the mAP is 36.4 when retinanet is trained for 26 epochs. According to my understanding, 12 epoch can achieve this accuracy. So where am I not understanding?

hhaAndroid avatar Apr 06 '22 14:04 hhaAndroid

@hhaAndroid This is very fair feedback; we must fix our documentation.

The exact training details should be already there (else it's a bug and we need to fix ASAP) but it's a bit of a mess right now because we literally just released the Multi-weight support API. I confirm that what @divideconcept and @xiaohu2015 told you is correct. The recipe URL that exists in every weight meta-data should link to the place where we have all the details. Sometimes that's a dedicated doc page and sometimes it's an issue/PR.

In Q2, as part of our documentation revamp, we plan to create cleaner model documentation. @NicolasHug is already looking into this and has created some prototypes at #5577. There is a dedicated feedback issue for our documentation #5511.

Yes. Really need a document like Model Zoo , otherwise the user has to open each model to understand.

hhaAndroid avatar Apr 06 '22 14:04 hhaAndroid

@datumbox OK. Thank you for your patient reply. I'm confused because the mAP is 36.4 when retinanet is trained for 26 epochs. According to my understanding, 12 epoch can achieve this accuracy. So where am I not understanding?

you are right, it should be get the same mAP use 1x (12 epoch) training.

xiaohu2015 avatar Apr 06 '22 14:04 xiaohu2015

@datumbox OK. Thank you for your patient reply. I'm confused because the mAP is 36.4 when retinanet is trained for 26 epochs. According to my understanding, 12 epoch can achieve this accuracy. So where am I not understanding?

you are right, it should be get the same mAP use 1x (12 epoch) training.

So why is the 26 epoch accuracy not improved?

hhaAndroid avatar Apr 06 '22 14:04 hhaAndroid

Yes. Really need a document like Model Zoo, otherwise the user has to open each model to understand.

Agreed. Any thoughts on whether this should be on the documentation page for the specific model builder (for example resnet50) or whether indeed you recommend a single Model Zoo doc? I believe @NicolasHug is investigating putting it on the dedicated model builder page but I could be wrong.

So why is the 26 epoch accuracy not improved?

The retinanet model you refer to is very old and wasn't trained by any current member of our team. Unfortunately we don't have the detailed training log but we know it was trained for 26 epochs. You are right to say that possibly that was an overkill but that's how it was produced and that's why we record it as such. Having said that, our team has now put processes in place to ensure our work is fully reproducible and thorough checks are done prior merging weights.

Finally, if you have more feedback about our documentation we would love to hear it at #5511. If you have feedback concerning this new model builders, use this thread.

datumbox avatar Apr 06 '22 15:04 datumbox

Hi @datumbox,

I am updating my personal vision library to take into account the new API and I find that it is a bit inconvenient to work with, mostly because it requires users to manually specify the weights for different models.

Suppose that for some application we always use pre-trained weights, then in the old API the user only needs to provide a single argument that is the model name:

for m in ['resnet50', 'resnet101', 'resnet152']: # etc.
    model = torchvision.models.__dict__[m](pretrained=True)

With the new API, there is a second argument that depends on the first:

model = models.__dict__['resnet50'](weights=ResNet50_Weights.DEFAULT)
model = models.__dict__['resnet101'](weights=ResNet101_Weights.DEFAULT)

This is rather cumbersome (and a bit weird), because if my model is resnet50 then obviously the weights should be from ResNet50_Weights, why do I still have to manually specify it?

I would suggest the following:

model = models.__dict__['resnet50'](weights='default') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='v1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='v2') # Use ResNet50_Weights.IMAGENET1K_V2
model = models.__dict__['resnet50'](weights=ResNet50_Weights.DEFAULT) # Use ResNet50_Weights.DEFAULT

This way we would have the best of both worlds (the old and new APIs).

netw0rkf10w avatar Apr 20 '22 17:04 netw0rkf10w

@netw0rkf10w Thanks a lot for the feedback.

If I understand you correctly, you would like to be able to initialize the weights using strings instead of providing the whole Enum. Is that right? If that's what you mean, this is already supported.

Examples:

# These all work:
model = models.__dict__['resnet50'](weights='DEFAULT') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='IMAGENET1K_V1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='IMAGENET1K_V2') # Use ResNet50_Weights.IMAGENET1K_V2

# Or the shorter version
model = resnet50(weights="DEFAULT")
model = resnet50(weights="IMAGENET1K_V1")
model = resnet50(weights="IMAGENET1K_V2")

Currently the values must be all capital (match their enum values). Unfortunately we can't make them shorter (aka "v1", "v2") because our intention is to offer many more pre-trained weights on various datasets. In fact we've already added Weakly supervised weights trained on instagram data (see #5708) and we plan to increase the number of datasets on the future. Thus we've opted for adding the dataset name in the weight name.

So since this supported why it's not well-documented? Two main reasons:

  1. Our documentation is still lacking; thankfully we are improving it quickly with the help of the community. See #5833 if you want to give us a hand
  2. We recommend using enums because they give you access to additional information such as the preprocessing transforms and the meta-data. See this example for more info.

On the near future we also plan to build better registration mechanisms so that you don't have to do:

# no more hacks like this:
model = models.__dict__['resnet50'](weights='DEFAULT') 

# but instead something like this:
model = models.get('resnet50')(weights='DEFAULT') 

We didn't roll this out yet because we want to offer a similar mechanism as in the new Datasets API which is in development.

If you have more comments/concerns or if I completely misunderstood what you meant, please let me know. Your feedback is very welcome. :)

datumbox avatar Apr 20 '22 19:04 datumbox

@datumbox Thanks a lot for the detailed reply!

Examples:

# These all work:
model = models.__dict__['resnet50'](weights='DEFAULT') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='IMAGENET1K_V1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='IMAGENET1K_V2') # Use ResNet50_Weights.IMAGENET1K_V2

# Or the shorter version
model = resnet50(weights="DEFAULT")
model = resnet50(weights="IMAGENET1K_V1")
model = resnet50(weights="IMAGENET1K_V2")

But this is exactly what I was looking for!! I didn't know that this is already supported, there's no documentation other than your blog post, which didn't mention this feature (perhaps it was not yet implemented at that time).

So since this supported why it's not well-documented? Two main reasons:

  1. Our documentation is still lacking; thankfully we are improving it quickly with the help of the community. See #5833 if you want to give us a hand

Thanks. I'll join the discussion in #5833 later (though I feel that general/common documentation for the new weight API would be currently more needed than per-model documentation as proposed in #5833; what I mean by general documentation could be similar to a succinct summary of your blog post, but updated with all the new features).

On the near future we also plan to build better registration mechanisms so that you don't have to do:

# no more hacks like this:
model = models.__dict__['resnet50'](weights='DEFAULT') 

# but instead something like this:
model = models.get('resnet50')(weights='DEFAULT') 

We didn't roll this out yet because we want to offer a similar mechanism as in the new Datasets API which is in development.

This is indeed a nice-to-have feature. Looking forward to its release!

Cheers!

netw0rkf10w avatar Apr 20 '22 19:04 netw0rkf10w

@datumbox Is there any plan to add the weights pre-trained (with self-supervision) on ImageNet-21K for vision_transformer? (cc @sallysyw @YosuaMichael) Thanks.

netw0rkf10w avatar Apr 21 '22 12:04 netw0rkf10w

Not immediate plans but it's something we can consider for future models. Up until recently that was not an option purely because we couldn't deploy multiple weights. Now that it's possible, we can assess it. I think that merits its own issue and would require some discussion for which models we should support.

datumbox avatar Apr 21 '22 14:04 datumbox