VideoMAE icon indicating copy to clipboard operation
VideoMAE copied to clipboard

Adding VideoMAE to HuggingFace Transformers

Open NielsRogge opened this issue 2 years ago • 10 comments

Hi VideoMAE team :)

I've implemented VideoMAE as a fork of 🤗 HuggingFace Transformers, and I'm going to add it soon to the library (see https://github.com/huggingface/transformers/pull/17821). Here's a notebook that illustrates inference with it: https://colab.research.google.com/drive/1ZX_XnM0ol81FbcxrFS3nNLkmn-0fzvQk?usp=sharing

The reason I'm adding VideoMAE is because I really like the simplicity of it, it was literally a single line of code change from ViT (nn.Conv2d -> nn.Conv3d).

As you may or may not know, any model on the HuggingFace hub has its own Github repository. E.g. the VideoMAE-base checkpoint fine-tuned on Kinetics-400 can be found here: https://huggingface.co/nielsr/videomae-base. If you check the "files and versions" tab, it includes the weights. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!

A model card can also be added to the repo, which is just a README.

Are you interested in creating an organization on the hub, such that we can store all model checkpoints there (rather than under my user name)?

Let me know!

Kind regards,

Niels ML Engineer @ HuggingFace

NielsRogge avatar Jun 22 '22 12:06 NielsRogge

Hi @NielsRogge! Thanks for your suggestions! We have created a org for https://huggingface.co/videomae. I want to know how to upload our models correctly?

yztongzhan avatar Jun 23 '22 09:06 yztongzhan

Hi @NielsRogge! Is there any update?

yztongzhan avatar Jul 07 '22 07:07 yztongzhan

Hi @yztongzhan,

I just worked a bit further on it, I've implemented VideoMAEForPreTraining now as well, which includes the decoder and loss computation. The PR is now ready for review and will be reviewed by my colleagues.

Also, would it be possible to create an organization on the hub for Multimedia Computing Group, Nanjing University, with a short name (rather than just VideoMAE)? Cause otherwise people will have to do:

from transformers import VideoMAEModel

model = VideoMAEForVideoClassification.from_pretrained("VideoMAE/videomae-base-finetuned-kinetics")

for instance, which means they have to type quite a lot of videomae 😂 also, if there would be newer models coming out that are also part of the research of the same organization (such as AdaMixer), it makes sense to upload them to the same organization on the hub.

Regards,

Niels

NielsRogge avatar Jul 07 '22 13:07 NielsRogge

Hi @NielsRogge ,

Thanks for your update. We have created an organization account on the hub:

https://huggingface.co/MCG-NJU

You can use this organization for storing our model checkpoints. BTW, you could also include our other repo such as AdaMixer and MixFormer.

Best, Limin

wanglimin avatar Jul 08 '22 01:07 wanglimin

@NielsRogge Any update?

wanglimin avatar Jul 26 '22 14:07 wanglimin

Hi @wanglimin,

the model will soon be added to the library. I'll transfer the weights to the MCG-NJYU organization today.

Are you interested in collaborating on a script for easy fine-tuning?

NielsRogge avatar Aug 02 '22 09:08 NielsRogge

I've currently transferred 3 models: https://huggingface.co/models?other=videomae.

To make the model names not too long, I would use the following names:

model_names = [
        # Kinetics-400 checkpoints (short = pretrained only for 800 epochs instead of 1600)
        "videomae-base-short",
        "videomae-base-short-finetuned-kinetics",
        "videomae-base",
        "videomae-base-finetuned-kinetics",
        "videomae-large",
        "videomae-large-finetuned-kinetics",
        # Something-Something-v2 checkpoints (short = pretrained only for 800 epochs instead of 2400)
        "videomae-base-short-ssv2",
        "videomae-base-short-finetuned-ssv2",
        "videomae-base-ssv2",
        "videomae-base-finetuned-ssv2",
    ]

Is that ok for you? Also, are you interested in adding model cards to the repos on the hub? Each model has its own git repo, and the model card is just a README (Markdown file).

NielsRogge avatar Aug 02 '22 10:08 NielsRogge

Hi @wanglimin,

VideoMAE has been added to the library! https://huggingface.co/docs/transformers/main/en/model_doc/videomae

Checkpoints are on the hub: https://huggingface.co/models?other=videomae

NielsRogge avatar Aug 08 '22 12:08 NielsRogge

Hi @NielsRogge! Thanks again for your efforts! We will add these links in README.

yztongzhan avatar Aug 08 '22 12:08 yztongzhan

@NielsRogge , Thanks a lot for your help!

wanglimin avatar Aug 09 '22 00:08 wanglimin