mobile-use

Perform mobile tasks using natural language instructions.

Features

Multi-Agent System: Uses three specialized agents (Planner, Navigator, and Validator) to complete mobile tasks
Mobile Automation: Integrates with Appium for reliable mobile device automation
Streaming Responses: Provides real-time feedback as agents work on tasks
Configurable Models: Allows different LLM models for each agent type
Provider Support: Works with OpenAI, Anthropic, and other LLM providers
Structured Logging: Comprehensive logging system with namespaces for debugging and monitoring

Architecture

Agents

Planner Agent: Analyzes the user's task and creates a plan for completing it
Navigator Agent: Executes mobile actions to complete the task
Validator Agent: Verifies if the task has been completed successfully

Mobile Automation

The application uses Appium for mobile automation, providing these capabilities:

App launching and navigation
Tapping on UI elements
Text input
Scrolling
Taking screenshots

Logging

The application includes a structured logging system that:

Supports different log levels (debug, info, warning, error)
Uses namespaces to organize logs by component
Provides log grouping for related operations
Automatically filters debug logs in production
Helps with debugging and monitoring application behavior

Getting Started

Prerequisites

Node.js 20+
Appium server
A connected mobile device or emulator/simulator to appium

Installation

Clone the repository
Install dependencies:
```
npm install
```
Set up Appium:
```
npm run setup:appium
```

Set up environment variables:

Create a .env.local file in the root directory
Add your API keys and configuration:

# LLM API Keys
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com

# Default LLM Provider and Model
DEFAULT_LLM_PROVIDER=openai
DEFAULT_MODEL_NAME=gpt-4o

# Mobile Testing Configuration
APPIUM_PORT=4723
APPIUM_HOST=localhost

Start Appium server:
```
npm run appium
```

Start the development server:

npm run dev

For debugging with logs:

# Show all debug logs
npm run dev:debug

# Show only agent-related logs
npm run dev:agent

# Custom debug namespaces
npm run debug [namespace]

# Examples:
npm run debug                  # All logs (DEBUG=*)
npm run debug agent            # All agent logs (DEBUG=agent:*)
npm run debug agent:navigator  # Navigator agent logs (DEBUG=agent:navigator*)
npm run debug mobile,api       # Mobile and API logs (DEBUG=mobile*,api*)

Open http://localhost:3000 in your browser

Configuration

Click the settings icon in the top right corner
Select your preferred provider and models
Configure device capabilities
Save your settings

Usage

Enter a task in the input field (e.g., "Open the calculator app and calculate 2+2")
Click "Submit" to start the task
Watch as the agents work together to complete your task on the mobile device
View the results in real-time

Examples

"Open the Settings app and turn on Airplane mode"
"Launch Chrome and search for the weather"
"Open the contacts app and create a new contact"
"Take a screenshot of the home screen"

Testing

To run tests:

npm test

Development

Logging

The application uses a structured logging system to help with debugging and monitoring:

import { createLogger } from '@/lib/utils/logger';

// Create a logger with a namespace
const logger = createLogger('MyComponent');

// Use different log levels
logger.debug('Detailed information for debugging');
logger.info('General information about application operation');
logger.warn('Warning about potential issues');
logger.error('Error information when something goes wrong');

// Group related logs
logger.group('Operation name');
logger.info('Step 1');
logger.info('Step 2');
logger.groupEnd();

Credits

Inspired by these github projects, arixv papers

Zurich, Dubai

mobile-use
mobile-use copied to clipboard

Metadata

mobile-use

Features

Architecture

Agents

Mobile Automation

Logging

Getting Started

Prerequisites

Installation

Configuration

Usage

Examples

Testing

Development

Logging

Credits

← Metadata

Owner

Metadata

mobile-use mobile-use copied to clipboard

Metadata

mobile-use

Features

Architecture

Agents

Mobile Automation

Logging

Getting Started

Prerequisites

Installation

Configuration

Usage

Examples

Testing

Development

Logging

Credits

← Metadata

Owner

Metadata

mobile-use
mobile-use copied to clipboard