large-multimodal-models topics

VisualWebBench

41

Stars

1

Forks

Watchers

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

VisualWebBench

computer-vision

deep-learning

evaluation

foundation-models

TextCoT

30

Stars

3

Forks

Watchers

The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.

bzluan

chain-of-thought

large-multimodal-models

Awesome-Multimodal-Papers

73

Stars

5

Forks

Watchers

A curated list of awesome Multimodal studies.

friedrichor

deep-learning

large-multimodal-models

multimodal

multimodal-data

MileBench

23

Stars

1

Forks

Watchers

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

MileBench

benchmark

computer-vision

deep-learning

deep-neural-networks

Open-LLaVA-NeXT

247

Stars

10

Forks

Watchers

An open-source implementation for training LLaVA-NeXT.

xiaoachen98

chatbot

chatgpt

gpt-4

gpt4o

IVM

21

Stars

2

Forks

Watchers

[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"

2toinf

computer-vision

deep-learning

instruction-following

instruction-tuning

OPERA

265

Stars

24

Forks

Watchers

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

shikiw

chatbot

chatgpt

gpt-4

large-multimodal-models

ShareGPT4Video

1.2k

Stars

44

Forks

Watchers

[NeurIPS 2024 D&B Track] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

ShareGPT4Omni

chatgpt

gpt

gpt-4v

large-language-models

ShareGPT4V

124

Stars

4

Forks

Watchers

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

ShareGPT4Omni

chatgpt

eccv2024

gpt

gpt-4v

vhs_benchmark

20

Stars

1

Forks

Watchers

🔥 Official Benchmark Toolkits for "Visual Haystacks: Answering Harder Questions About Sets of Images"

visual-haystacks

large-multimodal-models

long-context-modeling

multi-image-understanding

vision-language-model