multimodal-large-language-models topic
Chinese-CLIP-opencv-onnxrun
使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
Gemini-Commonsense-Evaluation
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
MM-InstructEval
This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimodal content comprehension tasks.
pytorch_mgie
A Gradio demo of MGIE
Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
DrugLAMP
A PyTorch-based system for highly accurate drug-target interaction predictions utilizing multi-modal large language models to discern structural affinities in drug-target pairs.
ComfyUI-Hangover-Moondream
Moondream is a lightweight multimodal large language model
Bunny
A family of lightweight multimodal models.
MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding