multimodal-large-language-models topic

List multimodal-large-language-models repositories

BLINK_Benchmark

102
Stars
6
Forks
Watchers

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

VisualAgentBench

92
Stars
1
Forks
Watchers

Towards Large Multimodal Models as Visual Foundation Agents

MiCo

84
Stars
4
Forks
Watchers

Explore the Limits of Omni-modal Pretraining at Scale

HolmesVAD

66
Stars
2
Forks
Watchers

Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"

multimodal-needle-in-a-haystack

33
Stars
0
Forks
Watchers

Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models"

5pils

29
Stars
0
Forks
Watchers

Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.

AlignGPT

29
Stars
3
Forks
Watchers

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

UMBRAE

24
Stars
2
Forks
Watchers

[ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality

VideoHallucer

38
Stars
0
Forks
38
Watchers

VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

RULE

21
Stars
0
Forks
Watchers

[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models