large-multimodal-models topic
Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
BenchLMM
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
OpenAdapt
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
LEGENT
Open Platform for Embodied Agents
TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"