multimodal-large-language-models topics

BLINK_Benchmark

102

Stars

6

Forks

Watchers

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

zeyofu

ai

benchmark

computer-vision

multimodal

VisualAgentBench

92

Stars

1

Forks

Watchers

Towards Large Multimodal Models as Visual Foundation Agents

THUDM

gpt

llm-agent

multimodal-large-language-models

MiCo

84

Stars

4

Forks

Watchers

Explore the Limits of Omni-modal Pretraining at Scale

invictus717

deep-learning

multimodal

multimodal-large-language-models

omnimodal

HolmesVAD

66

Stars

2

Forks

Watchers

Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"

pipixin321

anomaly-detection

datasets

multimodal-large-language-models

video

multimodal-needle-in-a-haystack

33

Stars

0

Forks

Watchers

Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models"

Wang-ML-Lab

benchmark

llm

multimodal-large-language-models

multimodal-needle-in-a-haystack