multimodality topic
Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation,...
MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
pyWikiMM
Collects a multimodal dataset of Wikipedia articles and their images
MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
5pils
Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.