omi icon indicating copy to clipboard operation
omi copied to clipboard

when asking to call uber, omi should select the correct app for the request

Open kodjima33 opened this issue 11 months ago • 16 comments

I think we should start with just this on long press:

when we ask a question in chat / clicking a button on device, omi should answer (as it currently does) and if there is any action to be made, to parse a request and automatically assign it to one of the apps. Example: If you say "hey omi call uber to home" - it should find Uber App and do a request if there are no apps to do it, omi should say "sorry I can't do that right now"

For this, you need to build an infrastructure so that apps could have access to be activated by omi.

kodjima33 avatar Jan 25 '25 02:01 kodjima33

One question: can you give me a list of external apps/service you want to support in this case ? Such as, uber, gmail or so.

ibrahimnd2000 avatar Feb 07 '25 09:02 ibrahimnd2000

@ibrahimnd2000 bro it's all the apps in our current app store and at least few potential apps like uber/gmail and etc...

kodjima33 avatar Feb 09 '25 01:02 kodjima33

/bounty $5000

kodjima33 avatar Feb 09 '25 01:02 kodjima33

💎 $5,000 bounty • omi

💎 $1,500 bounty • kodebykalab

Steps to solve:

  1. Start working: Comment /attempt #1728 with your implementation plan
  2. Submit work: Create a pull request including /claim #1728 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to BasedHardware/omi!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @ibrahimnd2000 Feb 9, 2025, 10:21:05 AM WIP
🟢 @Gokul2104 Mar 3, 2025, 2:22:50 PM WIP
🟢 @KodeByKalab Mar 3, 2025, 8:54:29 PM WIP

algora-pbc[bot] avatar Feb 09 '25 01:02 algora-pbc[bot]

@ibrahimnd2000 bro it's all the apps in our current app store and at least few potential apps like uber/gmail and etc...

Got it, I can work on it!

ibrahimnd2000 avatar Feb 09 '25 10:02 ibrahimnd2000

/attempt #1728

Implementation Plan:

To implement the described functionality, we need to create an infrastructure where Omi can parse user requests, identify the required action, and delegate it to the appropriate registered app. Here's a structured approach:


1. Natural Language Understanding (NLU) Component

Purpose: Parse user commands to extract intent, entities (e.g., app name, parameters), and context.

Implementation Steps:

  • Intent Detection: Use a machine learning model (e.g., OpenAI, Rasa, spaCy) or rule-based regex to classify intents (e.g., book_ride, send_message).
  • Entity Recognition: Extract:
    • App Name (e.g., "Uber" from "call Uber to home").
    • Parameters (e.g., destination "home").
  • Example:
    # Pseudocode for intent/entity extraction
    intent = classify_intent("hey omi call Uber to home")  # Output: "book_ride"
    entities = extract_entities("hey omi call Uber to home")  # Output: {"app": "Uber", "destination": "home"}
    

2. App Registry

Purpose: Maintain a registry of apps, their supported intents, and API endpoints. It can be a cloud registry, such as firebase cloud config.

Data Structure:

apps_registry = [
    {
        "app_name": "Uber",
        "supported_intents": ["book_ride"],
        "parameters": ["destination"],
        "endpoint": "http://uber-api/book_ride",
        "triggers": ["uber"]  # Keywords to identify the app in commands
    },
    # Other apps (e.g., Lyft, WhatsApp)
]

Registration Workflow:

  1. When an app integrates with Omi, it registers its capabilities via an API:
    POST /register_app
    Body: {
        "app_name": "Uber",
        "supported_intents": ["book_ride"],
        "endpoint": "http://uber-api/book_ride",
        "triggers": ["uber"]
    }
    
  2. Store this data in a database or in-memory registry.

3. Action Router

Purpose: Match the parsed intent/entities to a registered app and trigger its action.

Workflow:

  1. Use the extracted intent and app from NLU to search the registry.
  2. Validate required parameters (e.g., destination).
  3. If a matching app is found, call its API endpoint with parameters.
  4. If no app is found, return an error message.

Pseudocode:

def handle_command(user_command):
    intent = classify_intent(user_command)
    entities = extract_entities(user_command)
    
    # Find apps that support the intent and match the app name/trigger
    matched_apps = [
        app for app in apps_registry
        if intent in app["supported_intents"]
        and entities["app"] in app["triggers"]
    ]
    
    if not matched_apps:
        return "Sorry, I can't do that right now."
    
    app = matched_apps[0]  # Prioritize first match (or use user preferences)
    
    # Validate parameters
    missing_params = [
        param for param in app["parameters"]
        if param not in entities
    ]
    if missing_params:
        return f"Please specify: {', '.join(missing_params)}"
    
    # Call the app's API
    response = requests.post(
        app["endpoint"],
        json={param: entities[param] for param in app["parameters"]}
    )
    
    return "Done!" if response.ok else "Failed to execute."

4. App Integration Interface

Purpose: Allow apps to expose actions for Omi to trigger.

Requirements:

  • Apps must implement a REST API endpoint to handle requests.
  • Parameters must be passed in a standardized format (e.g., JSON).
  • Example Uber API:
    # Uber's backend endpoint (simplified)
    @app.route("/book_ride", methods=["POST"])
    def book_ride():
        destination = request.json["destination"]
        # Call Uber's internal booking logic
        return {"status": "success"}
    

5. Error Handling & User Feedback

  • No App Found:
    "Sorry, I can't do that right now."
  • Missing Parameters:
    "Please specify: [param1], [param2]."
  • App API Failure:
    "Failed to connect to [App Name]. Please try again later."

6. Example Workflow

  1. User Command:
    "hey omi call Uber to home".
  2. NLU Output:
    intent="book_ride", entities={"app": "Uber", "destination": "home"}.
  3. App Registry Lookup:
    Find apps supporting book_ride with trigger "Uber".
  4. API Call:
    Send {"destination": "home"} to Uber's endpoint.
  5. Response:
    Success: "Your Uber ride to home is booked!"
    Failure: "Sorry, I couldn't book your Uber ride."

7. Scalability Considerations

  • Conflict Resolution: If multiple apps match (e.g., user says, "send a message"), let the user choose or set a default.
  • Security: Authenticate apps during registration (e.g., OAuth 2.0).
  • Async Actions: Use webhooks or polling for long-running tasks.

This infrastructure allows Omi to dynamically delegate actions to registered apps while providing clear feedback to users.

ibrahimnd2000 avatar Feb 09 '25 10:02 ibrahimnd2000

@addbounty $1500

andrewgazelka avatar Feb 11 '25 03:02 andrewgazelka

banner button

Make a Draft PR early so others can see you are working on it! To automatically create one:

# Using npx (installed if you have NodeJS/npm)
npx bountybot start BasedHardware/omi#1728

# Or, use cargo (installed if you have Rust)
cargo install bounty
bounty start BasedHardware/omi#1728

When merged, you will receive the bounty!

addbounty avatar Feb 11 '25 03:02 addbounty

Can i simply create a plugin/service for Gmail ?

Different code needs to be added for different apps, one thing i can do is make a general structure where we can simply add the respective app/service along with its code and then create one for gmail to start with

will that be enough for the bounty or to get the code merged ?

AbdulMannan19 avatar Feb 11 '25 23:02 AbdulMannan19

@AbdulMannan19 100% different code needed for different apps but that's on app developers side

The goal of this task is the infrastructure that app devs can apply to

kodjima33 avatar Feb 12 '25 00:02 kodjima33

/attempt #1728

I feel like handling code for every application and for every prompt will be tough as user can give free text like multiple flows.

I am not sure what I think is possible but I guess it must be something like browser-use. OMI app must have more access to control over apps and via agents it must control the phone.

For example If user asks call uber to home, It must pick uber and book

So my approach is instead of NLU we must have LLM .

The input of LLM will be list of apps omi has access and their description also the user input.

LLM must respond to tools with the app-name.

Once the app is opened I expect something like browser use so that with live interaction the cab will be booked.

So my flow is

USER -> OMI -> LLM -> Tool -> AppHandlerAgent -> OMI

User: Book a uber cab to home. Omi : Parse Command (Input to LLM -> Command, list of apps and the description and available intents) LLM: call_tool app:uber, action:book ride to home Tool: validate app name and delegate task to ApphandlerAgent ApphandlerAgent: open the uber app do nagivation step by step to complete the request

By the above flow I guess we can handle any apps and if there are any problem in some apps we can create seperate agents for those

I am not a Mobile App Developer. But if the approach is okay I can start doing it by learning it parallel. @kodjima33 Please let me know if I can continue with this approach

Options

Gokul2104 avatar Mar 03 '25 14:03 Gokul2104

@KodeByKalab I have few changes in the proposed system. It is nearly same to the system some changes are proposed.

  1. If we are taking Available Apps and Intent We no need to send AppList from LLM.
  2. I expect the conversation of omi and user in the way.

User : Hey omi, I need to book a cab to home (with the apps (uber and grab)available and the question omi reaches AppHandlerAgent and AppHandlerAgent asks to mention one of the two apps.)

OMI : I see two apps to do this action uber and grab. Which one do you like to book? USER : Can you tell me the next cab time and price of both? (There must be some limit here to open number of apps to avoid issues) (omi must open both app prices)

OMI: In uber is $2 and in Grab its $2.5, both will be reaching in 5 mins. User: Book using Uber Omi: Booking successful

So for the above things to happen the communication between OMI and LLM should be multiway instead of completing and updating as done.

Also for UI Automation part In Android there must not be a problem but in iPhone for the app we must get approval from app store team to use other apps.

Security concern :If there are some access control where user allow which app omi can use and if we register that app alone it will be good.

Image

Gokul2104 avatar Mar 04 '25 02:03 Gokul2104

/attempt 1728

The proposed OMI architecture demonstrates several strong aspects while presenting some areas for improvement. Let's analyze the key components and suggest enhancements.

Architecture Analysis

The sequence diagram effectively illustrates the multi-way conversation flow, showing clear separation of concerns between components. However, to better understand the class structure and relationships, let's visualize the core architecture:

classDiagram
    OMI *-- AppHandlerAgent : uses
    OMI *-- PermissionManager : uses
    class OMI{
        -AppHandlerAgent agent
        -PermissionManager permissionManager
        -Map~String,dynamic~ conversationContext
        +processUserCommand(String command)
        -_handleMultiAppSelection(List~AppInfo~ apps)
        -_executeAction(AppInfo app, Intent intent)
    }
    class AppHandlerAgent{
        <<abstract>>
        +getPrice(String appName)* Future~String~
        +executeAction(String appName, String action)* Future~void~
    }
    class PermissionManager{
        -Map~String,AppPermission~ permissions
        -Map~String,DateTime~ permissionUsage
        +registerAppPermission(String appName, List~String~ actions) Future~void~
        +validatePermission(String appName, String action) Future~bool~
        -_logPermissionEvent(String eventType, String appName) Future~void~
    }
    class IOSAppHandler{
        -PermissionManager permissionManager
        +executeAction(String app, String action) Future~void~
        -_requestUserAuthorization() Future~void~
        -_checkAppStoreReviewStatus(String app) Future~bool~
    }
    class AndroidAppHandler{
        -PermissionManager permissionManager
        +executeAction(String app, String action) Future~void~
        -_requestRuntimePermissions(String app) Future~void~
        -_validateSystemPermissions(String app) Future~bool~
    }

Key points about the architecture diagram:

  • Solid diamonds (♦) indicate composition relationships, showing that OMI owns instances of AppHandlerAgent and PermissionManager
  • Methods marked with (*) are abstract, requiring implementation by concrete subclasses
  • Generic types are shown with tilde notation (e.g., List~AppInfo~ represents List<AppInfo>)
  • Private members are prefixed with (-), while public members use (+)

Security Analysis

The implementation demonstrates strong security practices, particularly in permission management. Here's a visualization of the permission validation flow:

sequenceDiagram
    participant U as User
    participant OMI as OMI
    participant PM as PermissionManager
    participant App as AppHandlerAgent
    
    U->>OMI: Execute Action
    activate OMI
    
    OMI->>PM: Validate Permission
    activate PM
    
    alt iOS Platform
        PM->>PM: Check App Store Review Status
    else Android Platform
        PM->>PM: Verify Runtime Permissions
    end
    
    PM-->>OMI: Permission Valid
    deactivate PM
    
    OMI->>App: Execute Action
    App-->>OMI: Success Response
    OMI-->>U: Confirm Execution
    deactivate OMI
    
    Note over PM: Logs permission event

Key points about the permission flow diagram:

  • Vertical rectangles show when each component is actively processing
  • The alt/else section demonstrates platform-specific permission checks
  • The logging note represents an asynchronous operation that occurs during permission validation

Recommendations for Enhancement

  1. Permission System Improvements```dart class PermissionManager { // Add permission scope tracking final Map<String, Set<PermissionScope>> appScopes = {};

Future registerAppPermission(String appName, List<String> actions) async { final scopes = _calculateRequiredScopes(actions); appScopes[appName] = scopes;

// Implement permission hierarchy validation
await _validateScopeHierarchy(scopes);

permissions[appName] = AppPermission(
  appName: appName,
  allowedActions: actions,
  expiryDate: DateTime.now().add(Duration(days: 30)),
  platform: Platform.operatingSystem,
  scopes: scopes
);

}

Set<PermissionScope> _calculateRequiredScopes(List<String> actions) { return actions.map((action) => _getScopeForAction(action)).toSet(); } }



2. **Enhanced Error Handling**```dart
class AppHandlerException implements Exception {
  final String message;
  final Exception? originalError;
  final ErrorType type;
  
  AppHandlerException(this.message, {this.originalError, required this.type});
  
  factory AppHandlerException.fromPlatform(String platformMessage) {
    return AppHandlerException(platformMessage,
      type: ErrorType.PLATFORM_ERROR);
  }
}

enum ErrorType {
  PLATFORM_ERROR,
  PERMISSION_DENIED,
  NETWORK_ERROR,
  APP_UNAVAILABLE
}
  1. Security Enhancements - Implement rate limiting for API calls
  • Add request signing for app communications
  • Enhance logging with audit trails
  • Implement secure storage for sensitive data
  1. Testing Strategy Improvements```dart group('Permission Tests', () { test('permission validation handles expired permissions', () async { // Arrange final manager = PermissionManager(); await manager.registerAppPermission( 'test_app', ['action'], expiryDate: DateTime.now().subtract(Duration(days: 1)) );

    // Act & Assert expect( await manager.validatePermission('test_app', 'action'), isFalse ); }); });




### Platform-Specific Considerations

1. **iOS Implementation**  - Implement proper Face ID/Touch ID integration
  - Handle App Store review requirements
  - Manage sandbox restrictions
  - Use Keychain for secure storage


2. **Android Implementation**  - Implement runtime permission management
  - Handle system-level permission checks
  - Secure UI automation
  - Use Android Keystore for cryptographic operations



### Performance Considerations

1. **Cache Management**```dart
class PermissionCache {
  final Duration cacheDuration = const Duration(hours: 1);
  final Map<String, CachedPermission> cache = {};
  
  Future<bool> checkCachedPermission(String appName) async {
    final cached = cache[appName];
    if (cached != null && 
        cached.timestamp.isAfter(DateTime.now().subtract(cacheDuration))) {
      return cached.isValid;
    }
    // Refresh cache...
  }
}
  1. Resource Optimization - Implement lazy loading for app handlers
  • Use connection pooling for API calls
  • Optimize permission validation queries

These enhancements maintain the system's core functionality while adding robust security measures, better error handling, and improved performance characteristics.

KodeByKalab avatar Mar 04 '25 04:03 KodeByKalab

Yes @KodeByKalab . What should my next steps? Can we connect via discord?

Gokul2104 avatar Mar 04 '25 04:03 Gokul2104

@KodeByKalab Can i start working on it?

Gokul2104 avatar Mar 10 '25 11:03 Gokul2104

@kodjima33 Is my solution ok Can I start?

Gokul2104 avatar Mar 30 '25 01:03 Gokul2104

is this open? @kodjima33

MithilSaiReddy avatar Oct 12 '25 17:10 MithilSaiReddy

/attempt #1728

Implementation plan 1=Goal When a user says “Book me a ride to home,” OMI automatically picks the best ride app based on location, installed apps, and user preferences, then opens it through a secure deep link

2= Scope Region: USA first
Supported apps: Uber, Lyft, Curb, Via, Ztrip
Future-ready: Bolt, Grab, Ola, DiDi, InDriver
Core feature: Book ride (ETA, pricing, scheduling coming later

3=How It Works When the user asks for a ride (like “Book me a ride to home”): OMI understands the request and identifies it’s about booking a ride.
It checks if location access is allowed.
It looks at which ride apps are available in that region.
It chooses the best available app that fits the user’s settings.
It opens that app so the user can confirm and complete the ride.

4= User Experience “Opening Uber to Home”
“Please enable Location or specify a destination.”
“No ride apps available in your region.”
Each case returns clear, helpful suggestions instead of errors.

5= Security & Privacy All logic runs on-device no server or cloud data
Location: “While-in-use” permission only
Whitelisted app schemes (Uber, Lyft, etc.)
No personal data logged (addresses/coordinates)

angeuwase744-ctrl avatar Oct 20 '25 15:10 angeuwase744-ctrl