visual-language-models topic
ROSGPT_Vision
Commanding robots using only Language Models' prompts
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
language-conditioned-robot-manipulation-models
https://arxiv.org/abs/2312.10807
AlignGPT
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
crab
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
VCR
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
CoN-CLIP
Implementation of the "Learn No to Say Yes Better" paper.
HOI-Ref
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
wildclip
Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models