multi-modality topic
clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
jina
☁️ Build multimodal AI applications with cloud-native stack
mmMOT
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
deep-daze
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
UVTR
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
CVPR21Chal-SLR
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
CRIS.pytorch
An official PyTorch implementation of the CRIS paper
ComposeAE
Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval
TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models