lmm topics

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

tianyi-lab

benchmark

benchmarks

gpt-4

gpt-4v

Video-LLaVA

238

Stars

11

Forks

Watchers

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

mbzuai-oryx

grounding

llm

lmm

transcription

groundingLMM

744

Stars

37

Forks

Watchers

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

mbzuai-oryx

foundation-models

llm-agent

lmm

vision-and-language

InternLM-XComposer

2.5k

Stars

152

Forks

Watchers

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

InternLM

chatgpt

foundation

gpt-4

gpt4v

LLaVA-Interactive-Demo

340

Stars

25

Forks

Watchers

LLaVA-Interactive-Demo

LLaVA-VL

lmm

multimodal

awesome-Large-MultiModal-Hallucination

135

Stars

11

Forks

Watchers

😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.

xieyuquanxx

hallucination

lmm

multi-modal

multimodal

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation,...

BAAI-Agents

ai

ai-agent

ai-agents-framework

computer-control