self-operating-computer issues

adding README comment on -accuracy and beginning of the -accuracy grid rewrite, and delete Poetry artifacts from README

8

Closes #77

Voice mode is not working on mac

3

Facing this issue while running operate --voice. I have already ran brew install portaudio and pip3 install -r requirements-audio.txt. OS: macOS Ventura 13.5.2

Siddhant-Goswami

Open source large language model support

11

Is it possible to run this and point it not at OpenAI but to self hosted large language model to do the thing?

andzejsp

Integrate Set-of-Mark Visual Prompting for GPT-4V

7

I noticed that you currently seem to apply a grid to the images to assist the vision model: - https://github.com/OthersideAI/self-operating-computer/blob/main/operate/main.py#L462-L527 And mention this in the README: > **Current Challenges** >...

0xdevalias

Refined Vision Prompt: Integration of Keyboard Shortcuts over Search Function

10

# ✨ Refined Vision Prompt: Integration of Keyboard Shortcuts over Search Function This PR proposes a significant methodological enhancement to the `VISION_PROMPT` framework. I'm proposing the `PRESS` action as a...

0x5844

> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

4

> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think > Would this work? https://llava-vl.github.io/ https://simonwillison.net/2023/Nov/29/llamafile/ _Originally posted...

Andy1996247

Generate multiple responses in polling and select the most popular choice? Particularly for -accurate grid overwrite

6

I was wondering if there was a reason we only picked the top response, or the 0th one. Instead, what if we asked the model to generate 9 responses, and...

klxu03

Good job but seems to be missing some things

6

Is there a possibility that a Retina display of a Mac or in general a 4K resolution screen confuse the algorithm ? The mouse could not find the elements that...

pligor

Google Search Results

1

**When prompting 'operate' to search web, retrieve some info and save to {a google sheet} why not implement more crawling, parsing, retrieving and memory logic to improve the tooling for...

TerminalGravity

search for default browser instead of google chrome.

7

ekinertac

self-operating-computer
self-operating-computer copied to clipboard

Metadata

adding README comment on -accuracy and beginning of the -accuracy grid rewrite, and delete Poetry artifacts from README

Voice mode is not working on mac

Open source large language model support

Integrate Set-of-Mark Visual Prompting for GPT-4V

Refined Vision Prompt: Integration of Keyboard Shortcuts over Search Function

> Yeah, if someone could get a PR of a vision model working locally on the project that'd be great I think

Generate multiple responses in polling and select the most popular choice? Particularly for -accurate grid overwrite

Good job but seems to be missing some things

Google Search Results

search for default browser instead of google chrome.

← Metadata

Owner

Metadata

self-operating-computer self-operating-computer copied to clipboard

Metadata

← Metadata

Owner

Metadata

self-operating-computer
self-operating-computer copied to clipboard