self-operating-computer
self-operating-computer copied to clipboard
A framework to enable multimodal models to operate a computer.
Facing this issue while running operate --voice. I have already ran brew install portaudio and pip3 install -r requirements-audio.txt. OS: macOS Ventura 13.5.2
Is it possible to run this and point it not at OpenAI but to self hosted large language model to do the thing?
I noticed that you currently seem to apply a grid to the images to assist the vision model: - https://github.com/OthersideAI/self-operating-computer/blob/main/operate/main.py#L462-L527 And mention this in the README: > **Current Challenges** >...
# ✨ Refined Vision Prompt: Integration of Keyboard Shortcuts over Search Function This PR proposes a significant methodological enhancement to the `VISION_PROMPT` framework. I'm proposing the `PRESS` action as a...
> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think > Would this work? https://llava-vl.github.io/ https://simonwillison.net/2023/Nov/29/llamafile/ _Originally posted...
I was wondering if there was a reason we only picked the top response, or the 0th one. Instead, what if we asked the model to generate 9 responses, and...
Is there a possibility that a Retina display of a Mac or in general a 4K resolution screen confuse the algorithm ? The mouse could not find the elements that...
**When prompting 'operate' to search web, retrieve some info and save to {a google sheet} why not implement more crawling, parsing, retrieving and memory logic to improve the tooling for...