multimodal-large-language-models topic
Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...
MM-NIAH
This is the official implementation of the paper "Needle In A Multimodal Haystack"
EasyDetect
[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.
DynMoE
[Preprint] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
EmpathyEar
Multimodal Empathetic Chatbot
Video-of-Thought
Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
CompBench
CompBench evaluates the comparative reasoning of multimodal large language models (MLLMs) with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, st...
Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources