self-operating-computer
self-operating-computer copied to clipboard
Interactive Model Selection, Custom Prompts, and Expanded Model Support
Version v2.0.15 (Latest) Release Summary
New Features:
- Centralized Model Management: All model configurations are now managed in a single file (
operate/models/model_configs.py), making it easier to add, remove, and manage models. - Expanded Ollama Model Support: Added support for
qwen2.5vl:3bandgemma3:4b. - Enhanced Debugging: Added a
-dflag (alias for--verbose) that provides detailed debugging information, including the full prompt sent to the AI and the raw response received.
Improvements:
- Improved System Prompt: The system prompt has been enhanced with a more structured format, explicit JSON schema definitions, and clear examples to improve model accuracy and reliability.
Bug Fixes:
- Fixed an issue where the model selection screen was not correctly displaying all available models.
- Resolved an
IndentationErrorin the model configuration file.
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.
Key Features
- Compatibility: Designed for various multimodal models.
- Expanded Model Support: Now integrated with the latest OpenAI o3, o4-mini, GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemma 3n models (including
e2bande4bvariants), and Gemma 3:12b alongside existing support for GPT-4o, Claude 3, Qwen-VL, and LLaVa. - Enhanced Ollama Integration: Improved handling for Ollama models, including default host configuration and more informative error messages.
- Future Plans: Support for additional models.