clickclickclick
clickclickclick copied to clipboard
A framework to enable autonomous android and computer use using any LLM (local or remote)
ClickClickClick
A robust framework enabling autonomous Android and computer control using any LLM (local or remote)
🚀 Features
- Multi-platform support: Android devices and macOS computers
- Multiple LLM providers: OpenAI, Anthropic Claude, Google Gemini, and local Ollama models
- Flexible interfaces: CLI, API, and web-based Gradio interface
- Visual automation: Screenshot-based element detection and interaction
- Configurable execution: Customizable timeouts, delays, and coordinate settings
📋 Table of Contents
- Quick Start
- Prerequisites
- Installation
- Usage
- Web Interface (Gradio)
- Command Line Interface
- Python API
- REST API
- Configuration
- Model Recommendations
- Examples
- Troubleshooting
- Contributing
- License
🎯 Quick Start
-
Install the package:
pip install git+https://github.com/instavm/clickclickclick.git -
Set up API keys (choose one):
export OPENAI_API_KEY="your-openai-key" # OR export ANTHROPIC_API_KEY="your-anthropic-key" # OR export GEMINI_API_KEY="your-gemini-key" -
Run a simple task:
click3 run "open Gmail and check for new messages"
📋 Prerequisites
For Android Control
- ADB (Android Debug Bridge): Install Android SDK Platform Tools
- USB Debugging: Enable on your Android device
- USB Connection: Connect device to computer
For macOS Control
- Python 3.11+: Required for all functionality
- Accessibility Permissions: Grant to Terminal/IDE when prompted
System Requirements
- Python 3.11 or higher
- 4GB+ RAM recommended
- Internet connection for cloud LLM providers
📦 Installation
Option 1: Direct Installation
pip install git+https://github.com/instavm/clickclickclick.git
Option 2: Development Installation
git clone https://github.com/instavm/clickclickclick
cd clickclickclick
pip install -e .
Verify Installation
click3 --help
🎮 Usage
Web Interface (Gradio)
Launch the interactive web interface:
click3 gradio
Features:
- Visual task input and monitoring
- Real-time screenshot feedback
- Model selection and configuration
- Task history and logs
Command Line Interface
Basic Usage:
click3 run "your task description"
Advanced Options:
click3 run "open calculator and compute 25 * 47" \
--platform=android \
--planner-model=openai \
--finder-model=gemini
Available Options:
--platform: Target platform (androidorosx)--planner-model: Planning LLM (openai,anthropic,gemini,ollama)--finder-model: Element detection LLM (openai,anthropic,gemini,ollama)
Python API
from clickclickclick.config import get_config
from clickclickclick.planner.task import execute_task
from clickclickclick.utils import get_executor, get_planner, get_finder
# Configure execution
config = get_config("android", "openai", "gemini")
executor = get_executor("android")
planner = get_planner("openai", config, executor)
finder = get_finder("gemini", config, executor)
# Execute task
success = execute_task(
"open the weather app",
executor, planner, finder, config
)
REST API
Start the API server:
uvicorn api:app --host 0.0.0.0 --port 8000
Execute tasks via HTTP:
curl -X POST "http://localhost:8000/execute" \
-H "Content-Type: application/json" \
-d '{
"task_prompt": "open calculator",
"platform": "android",
"planner_model": "openai",
"finder_model": "gemini"
}'
Response:
{"result": true}
⚙️ Configuration
Configuration is managed through config/models.yaml. Key settings include:
Model Configuration
openai:
api_key: !ENV OPENAI_API_KEY
model_name: gpt-4o-mini
image_width: 512
image_height: 512
gemini:
api_key: !ENV GEMINI_API_KEY
model_name: gemini-1.5-flash
image_width: 768
image_height: 768
Executor Configuration
executor:
android:
screen_center_x: 500
screen_center_y: 1000
scroll_distance: 1000
swipe_distance: 600
long_press_duration: 1000
Environment Variables
Required API keys (set one or more):
OPENAI_API_KEY: OpenAI GPT modelsANTHROPIC_API_KEY: Anthropic Claude modelsGEMINI_API_KEY: Google Gemini modelsOLLAMA_MODEL_NAME: Local Ollama model name
🎯 Model Recommendations
Based on performance testing:
| Use Case | Recommended Setup | Performance |
|---|---|---|
| Best Overall | Planner: GPT-4o, Finder: Gemini Flash | ⭐⭐⭐⭐⭐ |
| Cost Effective | Planner: GPT-4o-mini, Finder: Gemini Flash | ⭐⭐⭐⭐ |
| Privacy Focused | Planner: Ollama, Finder: Ollama | ⭐⭐⭐ |
| Speed Optimized | Planner: Gemini Flash, Finder: Gemini Flash | ⭐⭐⭐⭐ |
Notes:
- Gemini Flash offers 15 free API calls daily
- GPT-4o provides the most reliable planning
- Ollama enables fully offline operation
- Anthropic Claude offers balanced performance
📱 Examples
Android Examples
Gmail Task:
click3 run "create a draft email to [email protected] asking about lunch plans for Saturday at 1PM"
Navigation:
click3 run "open Google Maps and find bus stops in Alanson, MI"
Gaming:
click3 run "start a 3+2 chess game on lichess"
macOS Examples
Web Browsing:
click3 run "open Safari, go to news.ycombinator.com and read the top story" --platform=osx
System Tasks:
click3 run "open System Preferences and check the current display resolution" --platform=osx
🔧 Troubleshooting
Common Issues
ADB Connection Problems:
# Check device connection
adb devices
# Restart ADB server
adb kill-server
adb start-server
API Key Issues:
# Verify environment variables
echo $OPENAI_API_KEY
echo $GEMINI_API_KEY
# Set keys temporarily
export OPENAI_API_KEY="your-key-here"
Permission Errors (macOS):
- Grant Accessibility permissions in System Preferences > Security & Privacy
- Allow Terminal or your IDE to control other applications
Model-Specific Issues:
- Ollama: Ensure the model is downloaded (
ollama pull llama3.2-vision) - Gemini: Check API quota at Google AI Studio
- OpenAI: Verify billing and usage limits
Debug Mode
Enable detailed logging:
import logging
logging.basicConfig(level=logging.DEBUG)
Performance Optimization
- Reduce image resolution in
config/models.yaml - Increase
TASK_DELAYfor slower devices - Use smaller models for faster response times
🤝 Contributing
We welcome contributions! Please:
- Open an issue to discuss your idea
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
Development Setup
git clone https://github.com/instavm/clickclickclick
cd clickclickclick
pip install -e ".[test]"
pytest
📈 Roadmap
- [ ] iOS support via WebDriverAgent
- [ ] Windows support with Win32 APIs
- [ ] Voice command integration
- [ ] Multi-device orchestration
- [ ] Enhanced error recovery
- [ ] Plugin system for custom actions
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
🆘 Support
- 📖 Documentation: Check the examples and configuration sections
- 🐛 Bug Reports: Create an issue
- 💬 Discussions: GitHub Discussions
- ⭐ Star the repo if you find it useful!