ClickClickClick

A robust framework enabling autonomous Android and computer control using any LLM (local or remote)

click3

🚀 Features

Multi-platform support: Android devices and macOS computers
Multiple LLM providers: OpenAI, Anthropic Claude, Google Gemini, and local Ollama models
Flexible interfaces: CLI, API, and web-based Gradio interface
Visual automation: Screenshot-based element detection and interaction
Configurable execution: Customizable timeouts, delays, and coordinate settings

📋 Table of Contents

Quick Start
Prerequisites
Installation
Usage
- Web Interface (Gradio)
- Command Line Interface
- Python API
- REST API
Configuration
Model Recommendations
Examples
Troubleshooting
Contributing
License

🎯 Quick Start

Install the package:

pip install git+https://github.com/instavm/clickclickclick.git

Set up API keys (choose one):

export OPENAI_API_KEY="your-openai-key"
# OR
export ANTHROPIC_API_KEY="your-anthropic-key"
# OR
export GEMINI_API_KEY="your-gemini-key"

Run a simple task:

click3 run "open Gmail and check for new messages"

📋 Prerequisites

For Android Control

ADB (Android Debug Bridge): Install Android SDK Platform Tools
USB Debugging: Enable on your Android device
USB Connection: Connect device to computer

For macOS Control

Python 3.11+: Required for all functionality
Accessibility Permissions: Grant to Terminal/IDE when prompted

System Requirements

Python 3.11 or higher
4GB+ RAM recommended
Internet connection for cloud LLM providers

📦 Installation

Option 1: Direct Installation

pip install git+https://github.com/instavm/clickclickclick.git

Option 2: Development Installation

git clone https://github.com/instavm/clickclickclick
cd clickclickclick
pip install -e .

Verify Installation

click3 --help

🎮 Usage

Web Interface (Gradio)

Launch the interactive web interface:

click3 gradio

Features:

Visual task input and monitoring
Real-time screenshot feedback
Model selection and configuration
Task history and logs

Gradio interface

Command Line Interface

Basic Usage:

click3 run "your task description"

Advanced Options:

click3 run "open calculator and compute 25 * 47" \
  --platform=android \
  --planner-model=openai \
  --finder-model=gemini

Available Options:

--platform: Target platform (android or osx)
--planner-model: Planning LLM (openai, anthropic, gemini, ollama)
--finder-model: Element detection LLM (openai, anthropic, gemini, ollama)

Python API

from clickclickclick.config import get_config
from clickclickclick.planner.task import execute_task
from clickclickclick.utils import get_executor, get_planner, get_finder

# Configure execution
config = get_config("android", "openai", "gemini")
executor = get_executor("android")
planner = get_planner("openai", config, executor)
finder = get_finder("gemini", config, executor)

# Execute task
success = execute_task(
    "open the weather app",
    executor, planner, finder, config
)

REST API

Start the API server:

uvicorn api:app --host 0.0.0.0 --port 8000

Execute tasks via HTTP:

curl -X POST "http://localhost:8000/execute" \
  -H "Content-Type: application/json" \
  -d '{
    "task_prompt": "open calculator",
    "platform": "android",
    "planner_model": "openai",
    "finder_model": "gemini"
  }'

Response:

{"result": true}

⚙️ Configuration

Configuration is managed through config/models.yaml. Key settings include:

Model Configuration

openai:
  api_key: !ENV OPENAI_API_KEY
  model_name: gpt-4o-mini
  image_width: 512
  image_height: 512

gemini:
  api_key: !ENV GEMINI_API_KEY
  model_name: gemini-1.5-flash
  image_width: 768
  image_height: 768

Executor Configuration

executor:
  android:
    screen_center_x: 500
    screen_center_y: 1000
    scroll_distance: 1000
    swipe_distance: 600
    long_press_duration: 1000

Environment Variables

Required API keys (set one or more):

OPENAI_API_KEY: OpenAI GPT models
ANTHROPIC_API_KEY: Anthropic Claude models
GEMINI_API_KEY: Google Gemini models
OLLAMA_MODEL_NAME: Local Ollama model name

🎯 Model Recommendations

Based on performance testing:

Use Case	Recommended Setup	Performance
Best Overall	Planner: GPT-4o, Finder: Gemini Flash	⭐⭐⭐⭐⭐
Cost Effective	Planner: GPT-4o-mini, Finder: Gemini Flash	⭐⭐⭐⭐
Privacy Focused	Planner: Ollama, Finder: Ollama	⭐⭐⭐
Speed Optimized	Planner: Gemini Flash, Finder: Gemini Flash	⭐⭐⭐⭐

model recommendations

Notes:

Gemini Flash offers 15 free API calls daily
GPT-4o provides the most reliable planning
Ollama enables fully offline operation
Anthropic Claude offers balanced performance

📱 Examples

Android Examples

Gmail Task:

click3 run "create a draft email to [email protected] asking about lunch plans for Saturday at 1PM"

Navigation:

click3 run "open Google Maps and find bus stops in Alanson, MI"

Gaming:

click3 run "start a 3+2 chess game on lichess"

macOS Examples

Web Browsing:

click3 run "open Safari, go to news.ycombinator.com and read the top story" --platform=osx

System Tasks:

click3 run "open System Preferences and check the current display resolution" --platform=osx

🔧 Troubleshooting

Common Issues

ADB Connection Problems:

# Check device connection
adb devices

# Restart ADB server
adb kill-server
adb start-server

API Key Issues:

# Verify environment variables
echo $OPENAI_API_KEY
echo $GEMINI_API_KEY

# Set keys temporarily
export OPENAI_API_KEY="your-key-here"

Permission Errors (macOS):

Grant Accessibility permissions in System Preferences > Security & Privacy
Allow Terminal or your IDE to control other applications

Model-Specific Issues:

Ollama: Ensure the model is downloaded (ollama pull llama3.2-vision)
Gemini: Check API quota at Google AI Studio
OpenAI: Verify billing and usage limits

Debug Mode

Enable detailed logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Performance Optimization

Reduce image resolution in config/models.yaml
Increase TASK_DELAY for slower devices
Use smaller models for faster response times

🤝 Contributing

We welcome contributions! Please:

Open an issue to discuss your idea
Fork the repository
Create a feature branch
Make your changes with tests
Submit a pull request

Development Setup

git clone https://github.com/instavm/clickclickclick
cd clickclickclick
pip install -e ".[test]"
pytest

📈 Roadmap

[ ] iOS support via WebDriverAgent
[ ] Windows support with Win32 APIs
[ ] Voice command integration
[ ] Multi-device orchestration
[ ] Enhanced error recovery
[ ] Plugin system for custom actions

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🆘 Support

📖 Documentation: Check the examples and configuration sections
🐛 Bug Reports: Create an issue
💬 Discussions: GitHub Discussions
⭐ Star the repo if you find it useful!

Made with ❤️ by InstaVM | Follow us for updates!

clickclickclick
clickclickclick copied to clipboard

Metadata

ClickClickClick

🚀 Features

📋 Table of Contents

🎯 Quick Start

📋 Prerequisites

For Android Control

For macOS Control

System Requirements

📦 Installation

Option 1: Direct Installation

Option 2: Development Installation

Verify Installation

🎮 Usage

Web Interface (Gradio)

Command Line Interface

Python API

REST API

⚙️ Configuration

Model Configuration

Executor Configuration

Environment Variables

🎯 Model Recommendations

📱 Examples

Android Examples

macOS Examples

🔧 Troubleshooting

Common Issues

Debug Mode

Performance Optimization

🤝 Contributing

Development Setup

📈 Roadmap

📄 License

🆘 Support

← Metadata

Owner

Metadata

clickclickclick clickclickclick copied to clipboard

Metadata

ClickClickClick

🚀 Features

📋 Table of Contents

🎯 Quick Start

📋 Prerequisites

For Android Control

For macOS Control

System Requirements

📦 Installation

Option 1: Direct Installation

Option 2: Development Installation

Verify Installation

🎮 Usage

Web Interface (Gradio)

Command Line Interface

Python API

REST API

⚙️ Configuration

Model Configuration

Executor Configuration

Environment Variables

🎯 Model Recommendations

📱 Examples

Android Examples

macOS Examples

🔧 Troubleshooting

Common Issues

Debug Mode

Performance Optimization

🤝 Contributing

Development Setup

📈 Roadmap

📄 License

🆘 Support

← Metadata

Owner

Metadata

clickclickclick
clickclickclick copied to clipboard