DV Lab
DV Lab
Video-P2P
Video-P2P: Video Editing with Cross-attention Control
Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
MOOD
Official PyTorch implementation of MOOD series: (1) MOODv1: Rethinking Out-of-distributionDetection: Masked Image Modeling Is All You Need. (2) MOODv2: Masked Image Modeling for Out-of-Distribution...
GroupContrast
[CVPR 2024] GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
LLMGA
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
Mask-Attention-Free-Transformer
Official Implementation for "Mask-Attention-Free Transformer for 3D Instance Segmentation"
MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
MR-GSM8K
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs