vision-and-language topic
Discrete-Continuous-VLN
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
LXMERT-AdvTrain
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT adversarial training part
TVLT
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
eccv-caption
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
awesome-japanese-llm
日本語LLMまとめ - Overview of Japanese LLMs
OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
plip
Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation model for Pathology AI (Nature Medicine). PLIP is a large-scale pre-trained model that can be used to extra...