multimodality topic
Generative-AI
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
prml
Multimodal Fully Convolutional Neural networks for Semantic Segmentation.
swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
guidance-for-multi-omics-and-multi-modal-data-integration-and-analysis-on-aws
This guidance creates a scalable environment in AWS to prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and perform interactive queries against a data lake. Th...
NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
maidr
Multimodal Access and Interactive Data Representation
RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.