video-text-retrieval topic
UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
MAC
An end-to-end masked contrastive video-and-language pre-training framework
Cross-Modal-Adapter
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding