lvlm topic
ms-swift
Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Visio...
LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
AISurveyPapers
Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey
VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
OpenThinkIMG
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
FrameFusion
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"