self-operating-computer issues

Issue using non-QWERTY keyboards

6

I just noticed something interesting. I use a French keyboard ("AZERTY") and when the system "searches", it opens Sppotlight and writes "Google Chro,e" as if typing on a US keyboard,...

fpapleux

Parsing error, I don't know if my Gemini-pro API credentials have problem

6

I'm a newbie, please help : ) and I‘m really interested in this project :D

TheoneThefool

Object detection

20

Maybe a yolo object detection model trained on basic things to get coordinates? Or something like sam? i mean as soon as there is a small model, gpt4 can check...

admineral

Poor accuracy of pointer X/Y location inference

12

the X/Y coordinates inferred by the model are always off. It can't even select the address bar correctly.

ahsin-s

Google Gemini Vision - Error with Request. (ACCESS_TOKEN_SCOPE_INSUFFICIENT)

12

Request: `Open calculator and add 23 + 35` Output: ```c++ Error parsing JSON: 403 Request had insufficient authentication scopes. [ reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT" domain: "googleapis.com" metadata { key: "service" value: "generativelanguage.googleapis.com"...

haseeb-heaven

Proposal for Codebase Refactoring to Enhance Readability and Maintainability

9

I've been reviewing the project's codebase and noticed that all the logic and functions are currently contained within a single file. This structure, while functional, can make the code challenging...

gtlYashParmar

Add support for LLaVA through Ollama

**This PR is a work-in-progress.** The goal of this PR is to add support for LLaVA through Ollama. Todo: - [x] Successfully send prompt + image to LLaVA and get...

michaelhhogue

enhancement

Add scrolling support and replace CLICK action

13

**This PR aims to achieve two primary objectives:** 1. Support vertical mouse-wheel scrolling to let the model access UI elements which currently aren't on the screen. 2. Replace the CLICK...

michaelhhogue

Error parsing JSON: X get_image failed: error 8 (73, 0, 967)

11

[Self-Operating Computer] Hello, I can help you with anything. What would you like done? [User] google the word HI Error parsing JSON: X get_image failed: error 8 (73, 0, 967)...

Andy1996247

bug

Feat Containerize the application to improve cross OS compatibility

8

🚀 **PR Summary:** Adds a Dockerfile to support containerization as part of https://github.com/OthersideAI/self-operating-computer/issues/36. 🛠️ **Changes Made:** - Included Dockerfile for containerization. - Used Python:3.11-slim as the base image (Considering it's...

legendkartik45

self-operating-computer
self-operating-computer copied to clipboard

Metadata

Issue using non-QWERTY keyboards

Parsing error, I don't know if my Gemini-pro API credentials have problem

Object detection

Poor accuracy of pointer X/Y location inference

Google Gemini Vision - Error with Request. (ACCESS_TOKEN_SCOPE_INSUFFICIENT)

Proposal for Codebase Refactoring to Enhance Readability and Maintainability

Add support for LLaVA through Ollama

Add scrolling support and replace CLICK action

Error parsing JSON: X get_image failed: error 8 (73, 0, 967)

Feat Containerize the application to improve cross OS compatibility

← Metadata

Owner

Metadata

self-operating-computer self-operating-computer copied to clipboard

Metadata

← Metadata

Owner

Metadata

self-operating-computer
self-operating-computer copied to clipboard