self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

Interactive Model Selection, Custom Prompts, and Expanded Model Support

Open malah-code opened this issue 5 months ago • 0 comments

Version v2.0.15 (Latest) Release Summary

New Features:

  • Centralized Model Management: All model configurations are now managed in a single file (operate/models/model_configs.py), making it easier to add, remove, and manage models.
  • Expanded Ollama Model Support: Added support for qwen2.5vl:3b and gemma3:4b.
  • Enhanced Debugging: Added a -d flag (alias for --verbose) that provides detailed debugging information, including the full prompt sent to the AI and the raw response received.

Improvements:

  • Improved System Prompt: The system prompt has been enhanced with a more structured format, explicit JSON schema definitions, and clear examples to improve model accuracy and reliability.

Bug Fixes:

  • Fixed an issue where the model selection screen was not correctly displaying all available models.
  • Resolved an IndentationError in the model configuration file.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of usiself-ai-operating-computerng a multimodal model to view the screen and operate a computer.

Key Features

  • Compatibility: Designed for various multimodal models.
  • Expanded Model Support: Now integrated with the latest OpenAI o3, o4-mini, GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemma 3n models (including e2b and e4b variants), and Gemma 3:12b alongside existing support for GPT-4o, Claude 3, Qwen-VL, and LLaVa.
  • Enhanced Ollama Integration: Improved handling for Ollama models, including default host configuration and more informative error messages.
  • Future Plans: Support for additional models.

malah-code avatar Jul 07 '25 14:07 malah-code