multimodality topic

List multimodality repositories

Cradle

1.8k
Stars
160
Forks
21
Watchers

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation,...

MMStar

144
Stars
5
Forks
Watchers

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

GenerateU

128
Stars
6
Forks
Watchers

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Awesome-LLMs-meet-Multimodal-Generation

322
Stars
17
Forks
Watchers

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

pyWikiMM

15
Stars
2
Forks
Watchers

Collects a multimodal dataset of Wikipedia articles and their images

MileBench

23
Stars
1
Forks
Watchers

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

Ovis

347
Stars
18
Forks
Watchers

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

5pils

29
Stars
0
Forks
Watchers

Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.