gpt4v topic
AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
tarsier
Vision utilities for web interaction agents 👀
vscode-ui-sketcher
Draw your projects to life
sketch2app
The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandb...
Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
amazing-openai-api
Convert different model APIs into the OpenAI API format out of the box.
InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
GPT4-Vision-React-Starter
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
WebcamGPT-Vision
Lightweight GPT-4 Vision processing over the Webcam