multimodal-large-language-models topic
PIIP
[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)
Awesome-Anomaly-Detection-Foundation-Models
A curated list of papers & resources on anomaly detection foundation models using large language model, vision-language model, graph foundation model, time series foundation model, etc
SAIL
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
awesome-vla-for-ad
🌐 A curated collection of vision-language-action (VLA) models for autonomous driving applications
srbench
Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"
Libra
[ACL 2025] ⚖️ Temporally-aware MLLM for Biomedical Radiology Analysis and Report Generation. Flexible toolkit with MLLM backbone support, real-time validation, training resumption, and smart model sav...
multimind-sdk
Your SDK solves all of this. One interface. Unified logic. Local + hosted models. Fine-tuning. Agent tools. Enterprise-ready. Hybrid RAG.Star 🌟 if you like it!
Video-Bench
Video Generation Benchmark
Awesome-Token-Merge-for-MLLMs
A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.
HALVA
[ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination