vision-language-pretraining topic
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Continual-CLIP
Official repository for "CLIP model is an Efficient Continual Learner".
protoclip
📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Recognize-Any-Regions
Recognize Any Regions
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for...
FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
SegCLIP
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
svl_adapter
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
COSA
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model