multimodal-learning topic
LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
MSAF
Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
VIG
Dataset for Visually Indicated Sound Generation by Perceptually Optimized Classification
slp
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
open_flamingo
An open-source framework for training large multimodal models.
OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
ICCV-2023-Papers
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support...
XPretrain
Multi-modality pre-training
AMOS
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation