gpt4v topic
AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
tarsier
Vision utilities for web interaction agents 👀
vscode-ui-sketcher
Draw your projects to life
sketch2app
The ultimate sketch to code app made using GPT4 vision. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a si...
Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
amazing-openai-api
Convert different model APIs into the OpenAI API format out of the box.
InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
GPT4-Vision-React-Starter
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
WebcamGPT-Vision
Lightweight GPT-4 Vision processing over the Webcam