self-operating-computer
self-operating-computer copied to clipboard
A framework to enable multimodal models to operate a computer.
CogVLM is a Python based open source multimodal model. It is significantly better LLaVa, especially at identifying elements on a screen in fact it excels at that part. CogVLM is...
Although the installation process completed without any errors, I encountered an issue when attempting to launch run.sh. The error message I received states: install_log.txt "Unable to activate the virtual environment."...
After the operate command I get this message : To use `gpt-4-vision-preview` add an OpenAI API key I write my API key in config.py but I still have the message...
[Question] How can I integrate a third-party API? Is it enough to change the client.base_url in config.py to the URL of the third-party API, such as client.base_url = "https://api.example.com" ?...
## What does this PR do? Fixes #155 ## Requirement/Documentation - If there is a requirement document, please, share it here. ## Type of change ~~- [ ] Bug fix...
### Is your feature request related to a problem? Please describe. A cleaner way to manage dependencies is crucial at the beginning of a project to reduce technical debt and...
A known issue is that the detected position of the mouse is not accurate. Just as a workaround, could it be calibrated? A screen shoot could be captured, the mouse...
### Is your feature request related to a problem? Please describe. The calls to Open AI are wrapped in an async function, but the OpenAI client is still the synchronous...