VideoMAE
VideoMAE copied to clipboard
Adding VideoMAE to HuggingFace Transformers
Hi VideoMAE team :)
I've implemented VideoMAE as a fork of 🤗 HuggingFace Transformers, and I'm going to add it soon to the library (see https://github.com/huggingface/transformers/pull/17821). Here's a notebook that illustrates inference with it: https://colab.research.google.com/drive/1ZX_XnM0ol81FbcxrFS3nNLkmn-0fzvQk?usp=sharing
The reason I'm adding VideoMAE is because I really like the simplicity of it, it was literally a single line of code change from ViT (nn.Conv2d
-> nn.Conv3d
).
As you may or may not know, any model on the HuggingFace hub has its own Github repository. E.g. the VideoMAE-base checkpoint fine-tuned on Kinetics-400 can be found here: https://huggingface.co/nielsr/videomae-base. If you check the "files and versions" tab, it includes the weights. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!
A model card can also be added to the repo, which is just a README.
Are you interested in creating an organization on the hub, such that we can store all model checkpoints there (rather than under my user name)?
Let me know!
Kind regards,
Niels ML Engineer @ HuggingFace
Hi @NielsRogge! Thanks for your suggestions! We have created a org for https://huggingface.co/videomae. I want to know how to upload our models correctly?
Hi @NielsRogge! Is there any update?
Hi @yztongzhan,
I just worked a bit further on it, I've implemented VideoMAEForPreTraining
now as well, which includes the decoder and loss computation. The PR is now ready for review and will be reviewed by my colleagues.
Also, would it be possible to create an organization on the hub for Multimedia Computing Group, Nanjing University, with a short name (rather than just VideoMAE)? Cause otherwise people will have to do:
from transformers import VideoMAEModel
model = VideoMAEForVideoClassification.from_pretrained("VideoMAE/videomae-base-finetuned-kinetics")
for instance, which means they have to type quite a lot of videomae 😂 also, if there would be newer models coming out that are also part of the research of the same organization (such as AdaMixer), it makes sense to upload them to the same organization on the hub.
Regards,
Niels
Hi @NielsRogge ,
Thanks for your update. We have created an organization account on the hub:
https://huggingface.co/MCG-NJU
You can use this organization for storing our model checkpoints. BTW, you could also include our other repo such as AdaMixer and MixFormer.
Best, Limin
@NielsRogge Any update?
Hi @wanglimin,
the model will soon be added to the library. I'll transfer the weights to the MCG-NJYU organization today.
Are you interested in collaborating on a script for easy fine-tuning?
I've currently transferred 3 models: https://huggingface.co/models?other=videomae.
To make the model names not too long, I would use the following names:
model_names = [
# Kinetics-400 checkpoints (short = pretrained only for 800 epochs instead of 1600)
"videomae-base-short",
"videomae-base-short-finetuned-kinetics",
"videomae-base",
"videomae-base-finetuned-kinetics",
"videomae-large",
"videomae-large-finetuned-kinetics",
# Something-Something-v2 checkpoints (short = pretrained only for 800 epochs instead of 2400)
"videomae-base-short-ssv2",
"videomae-base-short-finetuned-ssv2",
"videomae-base-ssv2",
"videomae-base-finetuned-ssv2",
]
Is that ok for you? Also, are you interested in adding model cards to the repos on the hub? Each model has its own git repo, and the model card is just a README (Markdown file).
Hi @wanglimin,
VideoMAE has been added to the library! https://huggingface.co/docs/transformers/main/en/model_doc/videomae
Checkpoints are on the hub: https://huggingface.co/models?other=videomae
Hi @NielsRogge! Thanks again for your efforts! We will add these links in README.
@NielsRogge , Thanks a lot for your help!