Multi-Modal-Transformer
Multi-Modal-Transformer copied to clipboard
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised l...
Reading list in Transformer
This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.
Topics (paper and code)
-
Image Transformer
-
Video Transformer
-
Video & Language & other modality Transformer
-
Image & language & other modlity Trasformer
-
Natural Language Processing Transformer
-
Efficient Transformer
-
model compression
-
Self Supverpervised Learning in Vision
- other interested papers in related domains
Review Paper in multi-modal
- Video-language
Tutorials and workshop
-
Cross-View and Cross-Modal Visual Geo-Localization: IEEE CVPR 2021 Tutorial
-
From VQA to VLN: Recent Advances in Vision-and-Language Research: IEEE CVPR 2021 Tutorial
-
Tutorial on MultiModal Machine Learning: IEEE CVPR 2022 Tutorial
Datasets
- Multi-modal Datasets
Blogs
Tools
-
PyTorchVideo a deep learning library for video understanding research
-
horovod a tool for multi-gpu parallel processing
-
accelerate an easy API for mixed precision and any kind of distributed computing