qwen-vl topic
PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...
awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
webmarker
Mark web pages for use with vision-language models
lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Vision-Language-Models-Overview
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
Vision-SR1
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward