multimodal-large-language-models topic
BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]
VisualAgentBench
Towards Large Multimodal Models as Visual Foundation Agents
MiCo
Explore the Limits of Omni-modal Pretraining at Scale
HolmesVAD
Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"
multimodal-needle-in-a-haystack
Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models"
5pils
Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.
AlignGPT
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
UMBRAE
[ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality
VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
RULE
[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models