transformers
transformers copied to clipboard
[Community contributions] Model cards
Hey friends! 👋
We are currently in the process of improving the Transformers model cards by making them more directly useful for everyone. The main goal is to:
- Standardize all model cards with a consistent format so users know what to expect when moving between different model cards or trying to learn how to use a new model.
- Include a brief description of the model (what makes it unique/different) written in a way that's accessible to everyone.
- Provide ready to use code examples featuring the
Pipeline,AutoModel, andtransformers-cliwith available optimizations included. For large models, provide a quantization example so its easier for everyone to run the model. - Include an attention mask visualizer for currently supported models to help users visualize what a model is seeing (refer to #36630) for more details.
Compare the before and after model cards below:
With so many models in Transformers, we could really use some a hand with standardizing the existing model cards. If you're interested in making a contribution, pick a model from the list below and then you can get started!
Steps
Each model card should follow the format below. You can copy the text exactly as it is!
# add appropriate badges
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="" src="" >
</div>
</div>
# Model name
[Model name](https://huggingface.co/papers/...) ...
A brief description of the model and what makes it unique/different. Try to write this like you're talking to a friend.
You can find all the original [Model name] checkpoints under the [Model name](link) collection.
> [!TIP]
> Click on the [Model name] models in the right sidebar for more examples of how to apply [Model name] to different [insert task types here] tasks.
The example below demonstrates how to generate text based on an image with [`Pipeline`] or the [`AutoModel`] class.
<hfoptions id="usage">
<hfoption id="Pipeline>
insert pipeline code here
</hfoption>
<hfoption id="AutoModel">
add AutoModel code here
</hfoption>
<hfoption id="transformers-cli">
add transformers-cli usage here if applicable/supported, otherwise close the hfoption block
</hfoption>
</hfoptions
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [insert quantization method here](link to quantization method) to only quantize the weights to __.
# add if this is supported for your model
Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139) to better understand what tokens the model can and cannot attend to.
\```py
from transformers.utils.attention_visualizer import AttentionMaskVisualizer
visualizer = AttentionMaskVisualizer("google/gemma-3-4b-it")
visualizer("<img>What is shown in this image?")
\```
# upload image to https://huggingface.co/datasets/huggingface/documentation-images/tree/main/transformers/model_doc and ping me to merge
<div class="flex justify-center">
<img src=""/>
</div>
## Notes
- Any other model-specific notes should go here.
\```py
<insert relevant code snippet here related to the note if its available>
\ ```
For examples, take a look at #36469 or the BERT, Llama, Llama 2, Gemma 3, PaliGemma, ViT, and Whisper model cards on the main version of the docs.
Once you're done or if you have any questions, feel free to ping @stevhliu to review. Don't add fix to your PR to avoid closing this issue.
I'll also be right there working alongside you and opening PRs to convert the model cards so we can complete this faster together! 🤗
Models
- [ ] albert - #37753
- [x] align - #38072
- [x] altclip - #38306
- [x] aria - #38472
- [ ] audio_spectrogram_transformer - assigned to @KishanPipariya
- [ ] auto
- [ ] autoformer.- #37231
- [ ] aya_vision
- [ ] bamba
- [ ] bark
- [x] bart - #37858
- [ ] barthez
- [ ] bartpho
- [ ] beit
- [x] bert
- [ ] bert_generation
- [ ] bert_japanese - assigned to @KeshavSingh29
- [x] bertweet - #37981
- [x] big_bird - #37959
- [ ] bigbird_pegasus
- [x] biogpt - #38214
- [ ] bit
- [ ] blenderbot
- [ ] blenderbot_small
- [ ] blip. - #38513
- [ ] blip_2 - assigned to @olccihyeon
- [ ] bloom
- [ ] bridgetower
- [ ] bros
- [x] byt5 - #38699
- [ ] camembert
- [x] canine - #38631
- [ ] chameleon
- [ ] chinese_clip
- [ ] clap
- [x] clip - #37040
- [ ] clipseg
- [ ] clvp
- [x] code_llama - #37115
- [ ] codegen
- [x] cohere - #37056
- [ ] cohere2
- [x] colpali - #37309
- [ ] conditional_detr
- [ ] convbert - #38470
- [ ] convnext - assigned to @aleksmaksimovic
- [ ] convnextv2 - assigned to @aleksmaksimovic
- [ ] cpm
- [ ] cpmant
- [ ] ctrl - assigned to @Ishubhammohole
- [ ] cvt - assigned to @sezan92
- [ ] dab_detr
- [ ] dac
- [ ] data2vec
- [ ] dbrx
- [ ] deberta - #37409
- [ ] deberta_v2
- [ ] decision_transformer
- [ ] deformable_detr
- [ ] deit
- [ ] deprecated
- [x] depth_anything - #37065
- [ ] depth_pro
- [ ] detr
- [ ] dialogpt
- [ ] diffllama
- [ ] dinat
- [x] dinov2 - #37104
- [ ] dinov2_with_registers
- [x] distilbert - #37157
- [x] dit - #38721
- [x] donut - #37290
- [ ] dpr
- [ ] dpt
- [ ] efficientnet - assigned to @Sudhesh-Rajan27
- [x] electra - #37063
- [ ] emu3
- [ ] encodec
- [ ] encoder_decoder
- [ ] ernie
- [ ] esm
- [x] falcon - #37184
- [x] falcon_mamba - #37253
- [ ] fastspeech2_conformer - #37377
- [ ] flaubert
- [ ] flava
- [ ] fnet
- [ ] focalnet
- [ ] fsmt
- [ ] funnel
- [ ] fuyu
- [x] gemma - #37674
- [x] gemma2 - #37076
- [x] gemma3
- [ ] git
- [ ] glm
- [ ] glpn
- [ ] got_ocr2
- [x] gpt2 - #37101
- [ ] gpt_bigcode
- [x] gpt_neo - #38505
- [ ] gpt_neox - #38550
- [ ] gpt_neox_japanese
- [ ] gpt_sw3
- [ ] gptj
- [x] granite - #37791
- [ ] granitemoe
- [ ] granitemoeshared
- [ ] grounding_dino
- [ ] groupvit
- [ ] helium
- [ ] herbert
- [ ] hiera
- [ ] hubert
- [ ] ibert
- [ ] idefics
- [ ] idefics2
- [ ] idefics3
- [ ] ijepa
- [ ] imagegpt
- [ ] informer
- [ ] instructblip
- [ ] instructblipvideo
- [x] jamba - #37152
- [ ] jetmoe
- [ ] kosmos2
- [ ] layoutlm
- [ ] layoutlmv2
- [ ] layoutlmv3 - #37155
- [ ] layoutxlm
- [ ] led
- [ ] levit
- [ ] lilt
- [x] llama
- [x] llama2
- [ ] llama3 - assigned to @capnmav77
- [ ] llava
- [ ] llava_next
- [ ] llava_next_video
- [ ] llava_onevision
- [x] longformer - #37622
- [ ] longt5
- [ ] luke
- [ ] lxmert
- [ ] m2m_100
- [x] mamba - #37863
- [x] mamba2 - #37951
- [ ] marian
- [ ] markuplm
- [ ] mask2former
- [ ] maskformer
- [x] mbart - #37619
- [x] mbart50 - #37619
- [ ] megatron_bert
- [ ] megatron_gpt2
- [ ] mgp_str
- [ ] mimi
- [x] mistral - #37156
- [ ] mistral3 - assigned to @cassiasamp
- [ ] mixtral - assigned to @darmasrmez
- [ ] mllama - #37647
- [ ] mluke
- [x] mobilebert - #37256
- [x] mobilenet_v1 - #37948
- [x] mobilenet_v2 - #37948
- [ ] mobilevit
- [ ] mobilevitv2
- [x] modernbert - #37052
- [x] moonshine - #38711
- [ ] moshi
- [ ] mpnet - assigned to @SanjayDevarajan03
- [ ] mpt
- [ ] mra
- [ ] mt5
- [ ] musicgen
- [ ] musicgen_melody
- [ ] mvp
- [ ] myt5
- [ ] nemotron
- [ ] nllb
- [ ] nllb_moe
- [ ] nougat
- [ ] nystromformer
- [ ] olmo
- [x] olmo2 - #38394
- [ ] olmoe
- [ ] omdet_turbo
- [ ] oneformer
- [x] openai - #37255
- [ ] opt
- [ ] owlv2
- [ ] owlvit
- [x] paligemma
- [ ] patchtsmixer
- [ ] patchtst
- [x] pegasus - #38675
- [ ] pegasus_x
- [ ] perceiver
- [ ] persimmon
- [x] phi - #37583
- [ ] phi3 - assigned to @arpitsinghgautam
- [ ] phi4_multimodal - assigned to @Tanuj-rai
- [ ] phimoe
- [ ] phobert
- [ ] pix2struct
- [ ] pixtral - assigned to @BryanBradfo
- [ ] plbart
- [ ] poolformer
- [ ] pop2piano
- [ ] prompt_depth_anything
- [ ] prophetnet
- [ ] pvt
- [ ] pvt_v2
- [x] qwen2 - #37192
- [x] qwen2_5_vl - #37099
- [ ] qwen2_audio
- [x] qwen2_moe - #38649
- [ ] qwen2_vl - assigned to @SaiSanthosh1508
- [ ] rag
- [ ] recurrent_gemma
- [ ] reformer
- [ ] regnet
- [ ] rembert
- [ ] resnet - assigned to @BettyChen0616
- [x] roberta - #38777
- [ ] roberta_prelayernorm
- [ ] roc_bert
- [x] roformer - #37946
- [ ] rt_detr
- [ ] rt_detr_v2
- [ ] rwkv
- [ ] sam
- [ ] seamless_m4t
- [ ] seamless_m4t_v2
- [ ] segformer - assigned to @GSNCodes
- [ ] seggpt
- [ ] sew
- [ ] sew_d
- [ ] shieldgemma2 - assigned to @BryanBradfo
- [x] siglip - #37585
- [x] siglip2 - #37624
- [ ] smolvlm - assigned to @udapy
- [ ] speech_encoder_decoder
- [ ] speech_to_text
- [ ] speecht5
- [ ] splinter
- [ ] squeezebert
- [ ] stablelm
- [ ] starcoder2
- [ ] superglue
- [ ] superpoint
- [ ] swiftformer
- [ ] swin - assigned to @BryanBradfo
- [ ] swin2sr
- [x] swinv2 - #37942
- [ ] switch_transformers
- [x] t5 - #37261
- [ ] table_transformer
- [ ] tapas
- [ ] textnet
- [ ] time_series_transformer
- [ ] timesformer
- [ ] timm_backbone
- [ ] timm_wrapper
- [ ] trocr
- [ ] tvp
- [ ] udop
- [ ] umt5
- [ ] unispeech
- [ ] unispeech_sat
- [ ] univnet
- [ ] upernet
- [ ] video_llava
- [ ] videomae - assigned to @mreraser
- [ ] vilt
- [ ] vipllava
- [ ] vision_encoder_decoder - assigned to @Bhavay-2001
- [ ] vision_text_dual_encoder
- [ ] visual_bert
- [x] vit
- [x] vit_mae - #38302
- [ ] vit_msn
- [ ] vitdet
- [ ] vitmatte
- [ ] vitpose - #38630
- [ ] vitpose_backbone
- [x] vits - #37335
- [ ] vivit
- [ ] wav2vec2 - assigned to @AshAnand34
- [ ] wav2vec2_bert - assigned to @AshAnand34
- [ ] wav2vec2_conformer - assigned to @AshAnand34
- [ ] wav2vec2_phoneme - assigned to @AshAnand34
- [ ] wav2vec2_with_lm - assigned to @AshAnand34
- [ ] wavlm
- [x] whisper
- [ ] x_clip
- [ ] xglm
- [x] xlm - #38595
- [x] xlm_roberta - #38596
- [x] xlm_roberta_xl - #38597
- [ ] xlnet
- [ ] xmod
- [ ] yolos
- [ ] yoso
- [ ] zamba
- [ ] zamba2
- [x] zoedepth - #37898
Hi. I would like to work on model card for gemma 2.
Hi. I would like to work on model card for mistral.
Hi @stevhliu , this is my first contribution so I have a really basic question . Should I clone every repo under mistralai? I just cloned the repo mistralai/Ministral-8B-Instruct-2410, but there are many other repos under mistralai. It's ok if I need to, but I just want to be sure.
Hey , I would like to work on the model card for llama3 .
Hey @NahieliV, welcome! You only need to modify the mistral.md file. This is just for the model cards in the Transformers docs rather than the Hub.
Hey @stevhliu I would like to work on the model card for qwen2_5_vl.
@stevhliu Is it not possible to automate with an LLM?
hi @stevhliu i would be super grateful if you can let me work on the model card for code_llama
Hey @stevhliu, I would like to work on the cohere model card.
Hey @stevhliu , i would like to contribute to gpt2 model card
Hey @stevhliu , I would like to contribute to vitpose model card
Hey @stevhliu, I would like to work on the electra model card
@stevhliu I will update the model card for depth_anything.
PR: #37065
Hey @stevhliu , I would like to contribute to mixtral model card
To the folks who have been raising PR so far , just have a doubt did you get to install flax , tf-keras , sentencepiece etc.
Before making the changes, I'm trying to set up the environment following the steps here: https://github.com/huggingface/transformers/tree/main/docs.
Currently, I'm trying to build the documentation, but I repeatedly encounter errors such as Unable to register cuDNN factory: and the library installation errors. So would like to know if I am missing any steps or if all these library installations are necessary for making the changes
EDIT : Got it up and running, had to install all the libraries to make it run successfully. Initially felt doubtful about the need to install all the libraries such as flax but yea seems like it has to be installed too.
Hey @stevhliu, I would like to work on the phi3 model card
To the folks who have been raising PR so far , just have a doubt did you get to install
flax,tf-keras,sentencepieceetc. Before making the changes, I'm trying to set up the environment following the steps here: https://github.com/huggingface/transformers/tree/main/docs. Currently, I'm trying to build the documentation, but I repeatedly encounter errors such asUnable to register cuDNN factory:and the library installation errors. So would like to know if I am missing any steps or if all these library installations are necessary for making the changes
As you just going to edit the docs, you need not have a complete development setup. Fork the transformers repo, checkout a new branch, and start updating the Markdown document of your choice in the docs/source/en/model_doc directory.
@stevhliu I have updated the model card for the dinov2 model
PR: #37104
hey @stevhliu , I will love to do layoutlmv3
Big thanks for everyones interest so far and patience as I review your PRs! 🤗
A few tips and reminder:
- @shubham0204's suggestion is probably the easiest and fastest way to get started! You don't necessarily need to build the docs locally since the doc-builder will create a preview on your PR.
- Try to copy/paste as much as you can (where applicable), including the language, from the new model cards (for example, see Gemma 3) to make things easier!
Hi @stevhliu, I can work on vision encoder decoder. Thank you.
Hi @stevhliu , I can work on distilbert
Hi @stevhliu , I can work on Falcon, thanks!
Hi, I would like to work on Qwen2
Hi @stevhliu, I would like to work on model card for resnet
Hi @stevhliu, now going for falcon_mamba.
Hi @stevhliu I have updated the model card for the Autoformer model PR: https://github.com/huggingface/transformers/pull/37231
Hi @stevhliu, can i work on openai?
Hi @stevhliu, may I work on GPT-2?
Hi @stevhliu I can work on MobileBERT