multimodal-learning topic
CoVA-Web-Object-Detection
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!
valhalla-nmt
Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"
FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
PLMPapers
A paper list of pre-trained language models (PLMs).
AdaMML
Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.
MultiViz
[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models
HVPNeT
[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
mrg
Code for the paper "Multimodal Review Generation for Recommender Systems", WWW'19
Job-Recommend-Competition
🥇KNOW기반 직업 추천 알고리즘 경진대회 1등 솔루션입니다🥇
visually-informed-embedding-of-word-VIEW-
Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.