self-operating-computer
self-operating-computer copied to clipboard
A framework to enable multimodal models to operate a computer.
I just noticed something interesting. I use a French keyboard ("AZERTY") and when the system "searches", it opens Sppotlight and writes "Google Chro,e" as if typing on a US keyboard,...
I'm a newbie, please help : ) and Iām really interested in this project :D
Maybe a yolo object detection model trained on basic things to get coordinates? Or something like sam? i mean as soon as there is a small model, gpt4 can check...
the X/Y coordinates inferred by the model are always off. It can't even select the address bar correctly.
Request: `Open calculator and add 23 + 35` Output: ```c++ Error parsing JSON: 403 Request had insufficient authentication scopes. [ reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT" domain: "googleapis.com" metadata { key: "service" value: "generativelanguage.googleapis.com"...
I've been reviewing the project's codebase and noticed that all the logic and functions are currently contained within a single file. This structure, while functional, can make the code challenging...
**This PR is a work-in-progress.** The goal of this PR is to add support for LLaVA through Ollama. Todo: - [x] Successfully send prompt + image to LLaVA and get...
**This PR aims to achieve two primary objectives:** 1. Support vertical mouse-wheel scrolling to let the model access UI elements which currently aren't on the screen. 2. Replace the CLICK...
[Self-Operating Computer] Hello, I can help you with anything. What would you like done? [User] google the word HI Error parsing JSON: X get_image failed: error 8 (73, 0, 967)...
š **PR Summary:** Adds a Dockerfile to support containerization as part of https://github.com/OthersideAI/self-operating-computer/issues/36. š ļø **Changes Made:** - Included Dockerfile for containerization. - Used Python:3.11-slim as the base image (Considering it's...