cross-modal topic
XFlow
Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)
examples
Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.
VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
aaai17-cdq
The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"
Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
RIM
[CVPR 2023] Referring Image Matting
Text2Pos-CVPR2022
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
SOLC
Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类
multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥