self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

A framework to enable multimodal models to operate a computer.

Results 130 self-operating-computer issues
Sort by recently updated
recently updated
newest added

Changed numpy version ## What does this PR do? Fixes # (issue) ## Requirement/Documentation - If there is a requirement document, please, share it here. ## Type of change -...

### Is your feature request related to a problem? Please describe. Would be nice if we could any multimodal model from ollama, especially models with more parameters. Llava 7b is...

enhancement

## What does this PR do? Fixes # (issue) ## Requirement/Documentation - If there is a requirement document, please, share it here. ## Type of change - [ ] Bug...

## What does this PR do? - support OmniparserV2 official api - integrate Omniparser with Qwen ## Requirement/Documentation - Omniparser api: https://github.com/microsoft/OmniParser/blob/master/omnitool/omniparserserver/omniparserserver.py ## Type of change - [x] New feature...

## What does this PR do? When a website or the screen doesn't load quickly (in raspberry pi for example). The SOC now is capable of waiting for some time...

## What does this PR do? This fixes the ollama client code that is supposed to use the configured client vs `ollama.chat` directly. ## Requirement/Documentation Use [client code](https://github.com/ollama/ollama-python?tab=readme-ov-file#custom-client) vs [direct](https://github.com/ollama/ollama-python?tab=readme-ov-file#usage)...

Hey @joshbickett, I’m a college student exploring this repository and would like to do some research on it. I have a few questions: - Can this be used in headless...

enhancement

## What does this PR do? Allows Gemini to work again and reduce its failure rate. - Fixes Gemini model name in config.py - Improve success rate of Gemini tasks...

When a webpage or else lasts longer to appear, the AI thinks it would be good to wait for a bit to the page to load, but theres no such...

bug

## What does this PR do? This PR simplifies the screenshot code by removing the need to create the screenshots dir by adding a `.keep` file and retaining the directory....