self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

A framework to enable multimodal models to operate a computer.

Results 130 self-operating-computer issues
Sort by recently updated
recently updated
newest added

Facing this issue while running operate --voice. I have already ran brew install portaudio and pip3 install -r requirements-audio.txt. OS: macOS Ventura 13.5.2

Is it possible to run this and point it not at OpenAI but to self hosted large language model to do the thing?

I noticed that you currently seem to apply a grid to the images to assist the vision model: - https://github.com/OthersideAI/self-operating-computer/blob/main/operate/main.py#L462-L527 And mention this in the README: > **Current Challenges** >...

# ✨ Refined Vision Prompt: Integration of Keyboard Shortcuts over Search Function This PR proposes a significant methodological enhancement to the `VISION_PROMPT` framework. I'm proposing the `PRESS` action as a...

> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think > Would this work? https://llava-vl.github.io/ https://simonwillison.net/2023/Nov/29/llamafile/ _Originally posted...

I was wondering if there was a reason we only picked the top response, or the 0th one. Instead, what if we asked the model to generate 9 responses, and...

Is there a possibility that a Retina display of a Mac or in general a 4K resolution screen confuse the algorithm ? The mouse could not find the elements that...

**When prompting 'operate' to search web, retrieve some info and save to {a google sheet} why not implement more crawling, parsing, retrieving and memory logic to improve the tooling for...