vision-language topic
awesome-vision-language-modeling
Recent Advances in Vision-Language Pre-training!
NExT-OE
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
rewrite
[NeurIPS 2023] Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
TrackGPT
TrackGPT: Track What You Need in Videos via Text Prompts
TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
ProText
[CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".
Sambor
Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learning
MEP-3M
🎁 A Large-scale Multi-modal E-Commerce Products Dataset (LTDL@IJCAI-21 Best Dataset & Pattern Recognition 2023)
DeCEMBERT
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
VLMixer
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix (ICML 2022)