vision-language topic
BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
movienet-tools
Tools for movie and video research
Kaleido-BERT
💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
cliport
CLIPort: What and Where Pathways for Robotic Manipulation
Vision-Language-Transformer
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
pix2seq
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
vse_infty
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021
ContraCLIP
Authors official PyTorch implementation of the "ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences".