I think we should start with just this on long press:

when we ask a question in chat / clicking a button on device, omi should answer (as it currently does) and if there is any action to be made, to parse a request and automatically assign it to one of the apps. Example: If you say "hey omi call uber to home" - it should find Uber App and do a request if there are no apps to do it, omi should say "sorry I can't do that right now"

For this, you need to build an infrastructure so that apps could have access to be activated by omi.

Jan 25 '25 02:01 kodjima33

One question: can you give me a list of external apps/service you want to support in this case ? Such as, uber, gmail or so.

Feb 07 '25 09:02 ibrahimnd2000

@ibrahimnd2000 bro it's all the apps in our current app store and at least few potential apps like uber/gmail and etc...

Feb 09 '25 01:02 kodjima33

/bounty $5000

Feb 09 '25 01:02 kodjima33

💎 $5,000 bounty • omi

💎 $1,500 bounty • kodebykalab

Steps to solve:

Start working: Comment /attempt #1728 with your implementation plan
Submit work: Create a pull request including /claim #1728 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to BasedHardware/omi!

Add a bounty • Share on socials

Attempt	Started (GMT+0)	Solution
🟢 @ibrahimnd2000	Feb 9, 2025, 10:21:05 AM	WIP
🟢 @Gokul2104	Mar 3, 2025, 2:22:50 PM	WIP
🟢 @KodeByKalab	Mar 3, 2025, 8:54:29 PM	WIP

Feb 09 '25 01:02 algora-pbc[bot]

@ibrahimnd2000 bro it's all the apps in our current app store and at least few potential apps like uber/gmail and etc...

Got it, I can work on it!

Feb 09 '25 10:02 ibrahimnd2000

/attempt #1728

Implementation Plan:

To implement the described functionality, we need to create an infrastructure where Omi can parse user requests, identify the required action, and delegate it to the appropriate registered app. Here's a structured approach:

1. Natural Language Understanding (NLU) Component

Purpose: Parse user commands to extract intent, entities (e.g., app name, parameters), and context.

Implementation Steps:

Intent Detection: Use a machine learning model (e.g., OpenAI, Rasa, spaCy) or rule-based regex to classify intents (e.g., book_ride, send_message).
Entity Recognition: Extract:
- App Name (e.g., "Uber" from "call Uber to home").
- Parameters (e.g., destination "home").

Example:

# Pseudocode for intent/entity extraction
intent = classify_intent("hey omi call Uber to home")  # Output: "book_ride"
entities = extract_entities("hey omi call Uber to home")  # Output: {"app": "Uber", "destination": "home"}

2. App Registry

Purpose: Maintain a registry of apps, their supported intents, and API endpoints. It can be a cloud registry, such as firebase cloud config.

Data Structure:

apps_registry = [
    {
        "app_name": "Uber",
        "supported_intents": ["book_ride"],
        "parameters": ["destination"],
        "endpoint": "http://uber-api/book_ride",
        "triggers": ["uber"]  # Keywords to identify the app in commands
    },
    # Other apps (e.g., Lyft, WhatsApp)
]

Registration Workflow:

When an app integrates with Omi, it registers its capabilities via an API:

POST /register_app
Body: {
    "app_name": "Uber",
    "supported_intents": ["book_ride"],
    "endpoint": "http://uber-api/book_ride",
    "triggers": ["uber"]
}

Store this data in a database or in-memory registry.

3. Action Router

Purpose: Match the parsed intent/entities to a registered app and trigger its action.

Workflow:

Use the extracted intent and app from NLU to search the registry.
Validate required parameters (e.g., destination).
If a matching app is found, call its API endpoint with parameters.
If no app is found, return an error message.

Pseudocode:

def handle_command(user_command):
    intent = classify_intent(user_command)
    entities = extract_entities(user_command)
    
    # Find apps that support the intent and match the app name/trigger
    matched_apps = [
        app for app in apps_registry
        if intent in app["supported_intents"]
        and entities["app"] in app["triggers"]
    ]
    
    if not matched_apps:
        return "Sorry, I can't do that right now."
    
    app = matched_apps[0]  # Prioritize first match (or use user preferences)
    
    # Validate parameters
    missing_params = [
        param for param in app["parameters"]
        if param not in entities
    ]
    if missing_params:
        return f"Please specify: {', '.join(missing_params)}"
    
    # Call the app's API
    response = requests.post(
        app["endpoint"],
        json={param: entities[param] for param in app["parameters"]}
    )
    
    return "Done!" if response.ok else "Failed to execute."

4. App Integration Interface

Purpose: Allow apps to expose actions for Omi to trigger.

Requirements:

Apps must implement a REST API endpoint to handle requests.
Parameters must be passed in a standardized format (e.g., JSON).

Example Uber API:

# Uber's backend endpoint (simplified)
@app.route("/book_ride", methods=["POST"])
def book_ride():
    destination = request.json["destination"]
    # Call Uber's internal booking logic
    return {"status": "success"}

5. Error Handling & User Feedback

No App Found:
"Sorry, I can't do that right now."
Missing Parameters:
"Please specify: [param1], [param2]."
App API Failure:
"Failed to connect to [App Name]. Please try again later."

6. Example Workflow

User Command:
"hey omi call Uber to home".
NLU Output:
intent="book_ride", entities={"app": "Uber", "destination": "home"}.
App Registry Lookup:
Find apps supporting book_ride with trigger "Uber".
API Call:
Send {"destination": "home"} to Uber's endpoint.
Response:
Success: "Your Uber ride to home is booked!"
Failure: "Sorry, I couldn't book your Uber ride."

7. Scalability Considerations

Conflict Resolution: If multiple apps match (e.g., user says, "send a message"), let the user choose or set a default.
Security: Authenticate apps during registration (e.g., OAuth 2.0).
Async Actions: Use webhooks or polling for long-running tasks.

This infrastructure allows Omi to dynamically delegate actions to registered apps while providing clear feedback to users.

Feb 09 '25 10:02 ibrahimnd2000

@addbounty $1500

Feb 11 '25 03:02 andrewgazelka

Make a Draft PR early so others can see you are working on it! To automatically create one:

# Using npx (installed if you have NodeJS/npm)
npx bountybot start BasedHardware/omi#1728

# Or, use cargo (installed if you have Rust)
cargo install bounty
bounty start BasedHardware/omi#1728

When merged, you will receive the bounty!

Feb 11 '25 03:02 addbounty

Can i simply create a plugin/service for Gmail ?

Different code needs to be added for different apps, one thing i can do is make a general structure where we can simply add the respective app/service along with its code and then create one for gmail to start with

will that be enough for the bounty or to get the code merged ?

Feb 11 '25 23:02 AbdulMannan19

@AbdulMannan19 100% different code needed for different apps but that's on app developers side

The goal of this task is the infrastructure that app devs can apply to

Feb 12 '25 00:02 kodjima33

/attempt #1728

I feel like handling code for every application and for every prompt will be tough as user can give free text like multiple flows.

I am not sure what I think is possible but I guess it must be something like browser-use. OMI app must have more access to control over apps and via agents it must control the phone.

For example If user asks call uber to home, It must pick uber and book

So my approach is instead of NLU we must have LLM .

The input of LLM will be list of apps omi has access and their description also the user input.

LLM must respond to tools with the app-name.

Once the app is opened I expect something like browser use so that with live interaction the cab will be booked.

So my flow is

USER -> OMI -> LLM -> Tool -> AppHandlerAgent -> OMI

User: Book a uber cab to home. Omi : Parse Command (Input to LLM -> Command, list of apps and the description and available intents) LLM: call_tool app:uber, action:book ride to home Tool: validate app name and delegate task to ApphandlerAgent ApphandlerAgent: open the uber app do nagivation step by step to complete the request

By the above flow I guess we can handle any apps and if there are any problem in some apps we can create seperate agents for those

I am not a Mobile App Developer. But if the approach is okay I can start doing it by learning it parallel. @kodjima33 Please let me know if I can continue with this approach

Options

Cancel my attempt

Mar 03 '25 14:03 Gokul2104

@KodeByKalab I have few changes in the proposed system. It is nearly same to the system some changes are proposed.

If we are taking Available Apps and Intent We no need to send AppList from LLM.
I expect the conversation of omi and user in the way.

User : Hey omi, I need to book a cab to home (with the apps (uber and grab)available and the question omi reaches AppHandlerAgent and AppHandlerAgent asks to mention one of the two apps.)

OMI : I see two apps to do this action uber and grab. Which one do you like to book? USER : Can you tell me the next cab time and price of both? (There must be some limit here to open number of apps to avoid issues) (omi must open both app prices)

OMI: In uber is $2 and in Grab its $2.5, both will be reaching in 5 mins. User: Book using Uber Omi: Booking successful

So for the above things to happen the communication between OMI and LLM should be multiway instead of completing and updating as done.

Also for UI Automation part In Android there must not be a problem but in iPhone for the app we must get approval from app store team to use other apps.

Security concern :If there are some access control where user allow which app omi can use and if we register that app alone it will be good.

Mar 04 '25 02:03 Gokul2104

/attempt 1728

The proposed OMI architecture demonstrates several strong aspects while presenting some areas for improvement. Let's analyze the key components and suggest enhancements.

Architecture Analysis

The sequence diagram effectively illustrates the multi-way conversation flow, showing clear separation of concerns between components. However, to better understand the class structure and relationships, let's visualize the core architecture:

classDiagram
    OMI *-- AppHandlerAgent : uses
    OMI *-- PermissionManager : uses
    class OMI{
        -AppHandlerAgent agent
        -PermissionManager permissionManager
        -Map~String,dynamic~ conversationContext
        +processUserCommand(String command)
        -_handleMultiAppSelection(List~AppInfo~ apps)
        -_executeAction(AppInfo app, Intent intent)
    }
    class AppHandlerAgent{
        <<abstract>>
        +getPrice(String appName)* Future~String~
        +executeAction(String appName, String action)* Future~void~
    }
    class PermissionManager{
        -Map~String,AppPermission~ permissions
        -Map~String,DateTime~ permissionUsage
        +registerAppPermission(String appName, List~String~ actions) Future~void~
        +validatePermission(String appName, String action) Future~bool~
        -_logPermissionEvent(String eventType, String appName) Future~void~
    }
    class IOSAppHandler{
        -PermissionManager permissionManager
        +executeAction(String app, String action) Future~void~
        -_requestUserAuthorization() Future~void~
        -_checkAppStoreReviewStatus(String app) Future~bool~
    }
    class AndroidAppHandler{
        -PermissionManager permissionManager
        +executeAction(String app, String action) Future~void~
        -_requestRuntimePermissions(String app) Future~void~
        -_validateSystemPermissions(String app) Future~bool~
    }

Key points about the architecture diagram:

Solid diamonds (♦) indicate composition relationships, showing that OMI owns instances of AppHandlerAgent and PermissionManager
Methods marked with (*) are abstract, requiring implementation by concrete subclasses
Generic types are shown with tilde notation (e.g., List~AppInfo~ represents List<AppInfo>)
Private members are prefixed with (-), while public members use (+)

Security Analysis

The implementation demonstrates strong security practices, particularly in permission management. Here's a visualization of the permission validation flow:

sequenceDiagram
    participant U as User
    participant OMI as OMI
    participant PM as PermissionManager
    participant App as AppHandlerAgent
    
    U->>OMI: Execute Action
    activate OMI
    
    OMI->>PM: Validate Permission
    activate PM
    
    alt iOS Platform
        PM->>PM: Check App Store Review Status
    else Android Platform
        PM->>PM: Verify Runtime Permissions
    end
    
    PM-->>OMI: Permission Valid
    deactivate PM
    
    OMI->>App: Execute Action
    App-->>OMI: Success Response
    OMI-->>U: Confirm Execution
    deactivate OMI
    
    Note over PM: Logs permission event

Key points about the permission flow diagram:

Vertical rectangles show when each component is actively processing
The alt/else section demonstrates platform-specific permission checks
The logging note represents an asynchronous operation that occurs during permission validation

Recommendations for Enhancement

Permission System Improvements```dart class PermissionManager { // Add permission scope tracking final Map<String, Set<PermissionScope>> appScopes = {};

Future registerAppPermission(String appName, List<String> actions) async { final scopes = _calculateRequiredScopes(actions); appScopes[appName] = scopes;

// Implement permission hierarchy validation
await _validateScopeHierarchy(scopes);

permissions[appName] = AppPermission(
  appName: appName,
  allowedActions: actions,
  expiryDate: DateTime.now().add(Duration(days: 30)),
  platform: Platform.operatingSystem,
  scopes: scopes
);

}

Set<PermissionScope> _calculateRequiredScopes(List<String> actions) { return actions.map((action) => _getScopeForAction(action)).toSet(); } }



2. **Enhanced Error Handling**```dart
class AppHandlerException implements Exception {
  final String message;
  final Exception? originalError;
  final ErrorType type;
  
  AppHandlerException(this.message, {this.originalError, required this.type});
  
  factory AppHandlerException.fromPlatform(String platformMessage) {
    return AppHandlerException(platformMessage,
      type: ErrorType.PLATFORM_ERROR);
  }
}

enum ErrorType {
  PLATFORM_ERROR,
  PERMISSION_DENIED,
  NETWORK_ERROR,
  APP_UNAVAILABLE
}

Security Enhancements - Implement rate limiting for API calls

Add request signing for app communications
Enhance logging with audit trails
Implement secure storage for sensitive data

Testing Strategy Improvements```dart group('Permission Tests', () { test('permission validation handles expired permissions', () async { // Arrange final manager = PermissionManager(); await manager.registerAppPermission( 'test_app', ['action'], expiryDate: DateTime.now().subtract(Duration(days: 1)) );

// Act & Assert expect( await manager.validatePermission('test_app', 'action'), isFalse ); }); });




### Platform-Specific Considerations

1. **iOS Implementation**  - Implement proper Face ID/Touch ID integration
  - Handle App Store review requirements
  - Manage sandbox restrictions
  - Use Keychain for secure storage


2. **Android Implementation**  - Implement runtime permission management
  - Handle system-level permission checks
  - Secure UI automation
  - Use Android Keystore for cryptographic operations



### Performance Considerations

1. **Cache Management**```dart
class PermissionCache {
  final Duration cacheDuration = const Duration(hours: 1);
  final Map<String, CachedPermission> cache = {};
  
  Future<bool> checkCachedPermission(String appName) async {
    final cached = cache[appName];
    if (cached != null && 
        cached.timestamp.isAfter(DateTime.now().subtract(cacheDuration))) {
      return cached.isValid;
    }
    // Refresh cache...
  }
}

Resource Optimization - Implement lazy loading for app handlers

Use connection pooling for API calls
Optimize permission validation queries

These enhancements maintain the system's core functionality while adding robust security measures, better error handling, and improved performance characteristics.

Mar 04 '25 04:03 KodeByKalab

Yes @KodeByKalab . What should my next steps? Can we connect via discord?

Mar 04 '25 04:03 Gokul2104

@KodeByKalab Can i start working on it?

Mar 10 '25 11:03 Gokul2104

@kodjima33 Is my solution ok Can I start?

Mar 30 '25 01:03 Gokul2104

is this open? @kodjima33

Oct 12 '25 17:10 MithilSaiReddy

/attempt #1728

Implementation plan 1=Goal When a user says “Book me a ride to home,” OMI automatically picks the best ride app based on location, installed apps, and user preferences, then opens it through a secure deep link

2= Scope Region: USA first
Supported apps: Uber, Lyft, Curb, Via, Ztrip
Future-ready: Bolt, Grab, Ola, DiDi, InDriver
Core feature: Book ride (ETA, pricing, scheduling coming later

3=How It Works When the user asks for a ride (like “Book me a ride to home”): OMI understands the request and identifies it’s about booking a ride.
It checks if location access is allowed.
It looks at which ride apps are available in that region.
It chooses the best available app that fits the user’s settings.
It opens that app so the user can confirm and complete the ride.

4= User Experience “Opening Uber to Home”
“Please enable Location or specify a destination.”
“No ride apps available in your region.”
Each case returns clear, helpful suggestions instead of errors.

5= Security & Privacy All logic runs on-device no server or cloud data
Location: “While-in-use” permission only
Whitelisted app schemes (Uber, Lyft, etc.)
No personal data logged (addresses/coordinates)

Oct 20 '25 15:10 angeuwase744-ctrl

when asking to call uber, omi should select the correct app for the request

💎 $5,000 bounty • omi

💎 $1,500 bounty • kodebykalab

Steps to solve:

Implementation Plan:

1. Natural Language Understanding (NLU) Component

Implementation Steps:

2. App Registry

Data Structure:

Registration Workflow:

3. Action Router

Workflow:

Pseudocode:

4. App Integration Interface

Requirements:

5. Error Handling & User Feedback

6. Example Workflow

7. Scalability Considerations

Architecture Analysis

Security Analysis

Recommendations for Enhancement