multi-modal-learning topic
hcaptcha-challenger
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
japanese-clip
Japanese CLIP by rinna Co., Ltd.
TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
multimodal-emotion-analysis-in-conversations
Multi-model analysis of sentiment and emotion in multi-speaker conversations.
SAM-SLR-v2
SAM-SLR-v2 is an improved version of SAM-SLR for sign language recognition.
WSS-CMER
Code for the paper : "Weakly supervised segmentation with cross-modality equivariant constraints", available at https://arxiv.org/pdf/2104.02488.pdf
prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
CVPR-2023-24-Papers
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...
Achelous
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar