transformers
transformers copied to clipboard
Add FastVLM from CVPR 2025
Model description
Apple recently released FastVLM, a new vision-language model introduced at CVPR 2025, which significantly improves on previous models in the LLaVA family.
The smallest FastVLM variant outperforms LLaVA-OneVision-0.5B, achieving:
- 85× faster Time-to-First-Token (TTFT)
- 3.4× smaller vision encoder
Its ultra-low latency enables real-time deployment on mobile and edge devices, as demonstrated in the official demo. Given its strong performance and efficiency, I believe FastVLM would be a valuable addition to this repo.
If you also think so, I'd love to contribute this model.
Open source status
- [x] The model implementation is available
- [x] The model weights are available
Provide useful links for the implementation
Paper: https://www.arxiv.org/abs/2412.13303 Official repo: https://github.com/apple/ml-fastvlm
Hey @kamila-chay !
Yes, adding the model is already planned afaik. @ariG23498 had a nice inference script using transformers under this comment. To support the model in core library we need to convert weights, and we're waiting for Apple team on that I think
Hi @zucchini-nlp! Thank you for the answer, I didn't see the thread. Glad to hear it's already planned. You can mention me here if you need community support for this model in the future (to add new components etc, not quite sure what will be needed)
Now that it is public from apple, I think we can start working on it
Collection: https://huggingface.co/collections/apple/fastvlm-68ac97b9cd5cacefdd04872e
That's great to hear! I'm still willing to contribute, @ariG23498 let me know if i can open a PR and work on it. I have some free time now so I can start right away 💯
@kamila-chay that is great to hear.
@zucchini-nlp is that okay with you?
Sure, thanks @kamila-chay ! I haven't personally looked at model arch, it might be compatible with an existing llava or llava-next model code. Can you first check that and if yes, adapt a conversion script
It makes long term maintenance easier if we don't need to add new model class
Sure I will do that!
Hi @zucchini-nlp, @ariG23498, I just wanted to check in to make sure we’re aligned. I noticed that remote code was added to the official Apple repos on the HF hub 2 days ago, and I wasn’t sure if you still plan to integrate this model into the core library in that case. There’s quite a bit of new code to bring in if we go that route, so I wanted to clarify what the plan is 😃
I am fine with bringing the code in transformers, though let's ask @ariG23498 about what we have agreed on with Apple team. Just in case anything changed over the weekend :)
Hi @ariG23498, any info on that?
We are also looking to enable this model on ExecuTorch - would love to see this model in Transformers! Any plans on integrating the HF hub modeling code into the main repo?
Hey @kamila-chay sorry for the delay. It looks like we are on! You can start working on the FastVLM integration to transformers.
Looking forward to it @kamila-chay 🙏