cross-modal-pretraining topic
List
cross-modal-pretraining repositories
Video-LLaMA
2.7k
Stars
242
Forks
15
Watchers
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
RLIP
71
Stars
3
Forks
Watchers
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Graph Generation.