vision-and-language topic
hateful_memes-hate_detectron
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
VLCAP
[ICIP 2022] VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
open-fashion-clip
This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023
MAC
An end-to-end masked contrastive video-and-language pre-training framework
DRFT
End-to-end Multi-modal Video Temporal Grounding, NeurIPS 2021
awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
RS5M
RS5M: a large-scale vision language dataset for remote sensing
Cross-Modal-Adapter
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
EDA
[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding