when asking to call uber, omi should select the correct app for the request
I think we should start with just this on long press:
when we ask a question in chat / clicking a button on device, omi should answer (as it currently does) and if there is any action to be made, to parse a request and automatically assign it to one of the apps. Example: If you say "hey omi call uber to home" - it should find Uber App and do a request if there are no apps to do it, omi should say "sorry I can't do that right now"
For this, you need to build an infrastructure so that apps could have access to be activated by omi.
One question: can you give me a list of external apps/service you want to support in this case ? Such as, uber, gmail or so.
@ibrahimnd2000 bro it's all the apps in our current app store and at least few potential apps like uber/gmail and etc...
/bounty $5000
💎 $5,000 bounty • omi
💎 $1,500 bounty • kodebykalab
Steps to solve:
- Start working: Comment
/attempt #1728with your implementation plan - Submit work: Create a pull request including
/claim #1728in the PR body to claim the bounty - Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts
Thank you for contributing to BasedHardware/omi!
Add a bounty • Share on socials
| Attempt | Started (GMT+0) | Solution |
|---|---|---|
| 🟢 @ibrahimnd2000 | Feb 9, 2025, 10:21:05 AM | WIP |
| 🟢 @Gokul2104 | Mar 3, 2025, 2:22:50 PM | WIP |
| 🟢 @KodeByKalab | Mar 3, 2025, 8:54:29 PM | WIP |
@ibrahimnd2000 bro it's all the apps in our current app store and at least few potential apps like uber/gmail and etc...
Got it, I can work on it!
/attempt #1728
Implementation Plan:
To implement the described functionality, we need to create an infrastructure where Omi can parse user requests, identify the required action, and delegate it to the appropriate registered app. Here's a structured approach:
1. Natural Language Understanding (NLU) Component
Purpose: Parse user commands to extract intent, entities (e.g., app name, parameters), and context.
Implementation Steps:
- Intent Detection: Use a machine learning model (e.g., OpenAI, Rasa, spaCy) or rule-based regex to classify intents (e.g.,
book_ride,send_message). - Entity Recognition: Extract:
- App Name (e.g., "Uber" from "call Uber to home").
- Parameters (e.g., destination "home").
- Example:
# Pseudocode for intent/entity extraction intent = classify_intent("hey omi call Uber to home") # Output: "book_ride" entities = extract_entities("hey omi call Uber to home") # Output: {"app": "Uber", "destination": "home"}
2. App Registry
Purpose: Maintain a registry of apps, their supported intents, and API endpoints. It can be a cloud registry, such as firebase cloud config.
Data Structure:
apps_registry = [
{
"app_name": "Uber",
"supported_intents": ["book_ride"],
"parameters": ["destination"],
"endpoint": "http://uber-api/book_ride",
"triggers": ["uber"] # Keywords to identify the app in commands
},
# Other apps (e.g., Lyft, WhatsApp)
]
Registration Workflow:
- When an app integrates with Omi, it registers its capabilities via an API:
POST /register_app Body: { "app_name": "Uber", "supported_intents": ["book_ride"], "endpoint": "http://uber-api/book_ride", "triggers": ["uber"] } - Store this data in a database or in-memory registry.
3. Action Router
Purpose: Match the parsed intent/entities to a registered app and trigger its action.
Workflow:
- Use the extracted
intentandappfrom NLU to search the registry. - Validate required parameters (e.g.,
destination). - If a matching app is found, call its API endpoint with parameters.
- If no app is found, return an error message.
Pseudocode:
def handle_command(user_command):
intent = classify_intent(user_command)
entities = extract_entities(user_command)
# Find apps that support the intent and match the app name/trigger
matched_apps = [
app for app in apps_registry
if intent in app["supported_intents"]
and entities["app"] in app["triggers"]
]
if not matched_apps:
return "Sorry, I can't do that right now."
app = matched_apps[0] # Prioritize first match (or use user preferences)
# Validate parameters
missing_params = [
param for param in app["parameters"]
if param not in entities
]
if missing_params:
return f"Please specify: {', '.join(missing_params)}"
# Call the app's API
response = requests.post(
app["endpoint"],
json={param: entities[param] for param in app["parameters"]}
)
return "Done!" if response.ok else "Failed to execute."
4. App Integration Interface
Purpose: Allow apps to expose actions for Omi to trigger.
Requirements:
- Apps must implement a REST API endpoint to handle requests.
- Parameters must be passed in a standardized format (e.g., JSON).
- Example Uber API:
# Uber's backend endpoint (simplified) @app.route("/book_ride", methods=["POST"]) def book_ride(): destination = request.json["destination"] # Call Uber's internal booking logic return {"status": "success"}
5. Error Handling & User Feedback
- No App Found:
"Sorry, I can't do that right now." - Missing Parameters:
"Please specify: [param1], [param2]." - App API Failure:
"Failed to connect to [App Name]. Please try again later."
6. Example Workflow
- User Command:
"hey omi call Uber to home". - NLU Output:
intent="book_ride", entities={"app": "Uber", "destination": "home"}. - App Registry Lookup:
Find apps supportingbook_ridewith trigger "Uber". - API Call:
Send{"destination": "home"}to Uber's endpoint. - Response:
Success:"Your Uber ride to home is booked!"
Failure:"Sorry, I couldn't book your Uber ride."
7. Scalability Considerations
- Conflict Resolution: If multiple apps match (e.g., user says, "send a message"), let the user choose or set a default.
- Security: Authenticate apps during registration (e.g., OAuth 2.0).
- Async Actions: Use webhooks or polling for long-running tasks.
This infrastructure allows Omi to dynamically delegate actions to registered apps while providing clear feedback to users.
@addbounty $1500
Make a Draft PR early so others can see you are working on it! To automatically create one:
# Using npx (installed if you have NodeJS/npm)
npx bountybot start BasedHardware/omi#1728
# Or, use cargo (installed if you have Rust)
cargo install bounty
bounty start BasedHardware/omi#1728
When merged, you will receive the bounty!
Can i simply create a plugin/service for Gmail ?
Different code needs to be added for different apps, one thing i can do is make a general structure where we can simply add the respective app/service along with its code and then create one for gmail to start with
will that be enough for the bounty or to get the code merged ?
@AbdulMannan19 100% different code needed for different apps but that's on app developers side
The goal of this task is the infrastructure that app devs can apply to
/attempt #1728
I feel like handling code for every application and for every prompt will be tough as user can give free text like multiple flows.
I am not sure what I think is possible but I guess it must be something like browser-use. OMI app must have more access to control over apps and via agents it must control the phone.
For example If user asks call uber to home, It must pick uber and book
So my approach is instead of NLU we must have LLM .
The input of LLM will be list of apps omi has access and their description also the user input.
LLM must respond to tools with the app-name.
Once the app is opened I expect something like browser use so that with live interaction the cab will be booked.
So my flow is
USER -> OMI -> LLM -> Tool -> AppHandlerAgent -> OMI
User: Book a uber cab to home. Omi : Parse Command (Input to LLM -> Command, list of apps and the description and available intents) LLM: call_tool app:uber, action:book ride to home Tool: validate app name and delegate task to ApphandlerAgent ApphandlerAgent: open the uber app do nagivation step by step to complete the request
By the above flow I guess we can handle any apps and if there are any problem in some apps we can create seperate agents for those
I am not a Mobile App Developer. But if the approach is okay I can start doing it by learning it parallel. @kodjima33 Please let me know if I can continue with this approach
Options
@KodeByKalab I have few changes in the proposed system. It is nearly same to the system some changes are proposed.
- If we are taking Available Apps and Intent We no need to send AppList from LLM.
- I expect the conversation of omi and user in the way.
User : Hey omi, I need to book a cab to home (with the apps (uber and grab)available and the question omi reaches AppHandlerAgent and AppHandlerAgent asks to mention one of the two apps.)
OMI : I see two apps to do this action uber and grab. Which one do you like to book? USER : Can you tell me the next cab time and price of both? (There must be some limit here to open number of apps to avoid issues) (omi must open both app prices)
OMI: In uber is $2 and in Grab its $2.5, both will be reaching in 5 mins. User: Book using Uber Omi: Booking successful
So for the above things to happen the communication between OMI and LLM should be multiway instead of completing and updating as done.
Also for UI Automation part In Android there must not be a problem but in iPhone for the app we must get approval from app store team to use other apps.
Security concern :If there are some access control where user allow which app omi can use and if we register that app alone it will be good.
/attempt 1728
The proposed OMI architecture demonstrates several strong aspects while presenting some areas for improvement. Let's analyze the key components and suggest enhancements.
Architecture Analysis
The sequence diagram effectively illustrates the multi-way conversation flow, showing clear separation of concerns between components. However, to better understand the class structure and relationships, let's visualize the core architecture:
classDiagram
OMI *-- AppHandlerAgent : uses
OMI *-- PermissionManager : uses
class OMI{
-AppHandlerAgent agent
-PermissionManager permissionManager
-Map~String,dynamic~ conversationContext
+processUserCommand(String command)
-_handleMultiAppSelection(List~AppInfo~ apps)
-_executeAction(AppInfo app, Intent intent)
}
class AppHandlerAgent{
<<abstract>>
+getPrice(String appName)* Future~String~
+executeAction(String appName, String action)* Future~void~
}
class PermissionManager{
-Map~String,AppPermission~ permissions
-Map~String,DateTime~ permissionUsage
+registerAppPermission(String appName, List~String~ actions) Future~void~
+validatePermission(String appName, String action) Future~bool~
-_logPermissionEvent(String eventType, String appName) Future~void~
}
class IOSAppHandler{
-PermissionManager permissionManager
+executeAction(String app, String action) Future~void~
-_requestUserAuthorization() Future~void~
-_checkAppStoreReviewStatus(String app) Future~bool~
}
class AndroidAppHandler{
-PermissionManager permissionManager
+executeAction(String app, String action) Future~void~
-_requestRuntimePermissions(String app) Future~void~
-_validateSystemPermissions(String app) Future~bool~
}
Key points about the architecture diagram:
- Solid diamonds (♦) indicate composition relationships, showing that OMI owns instances of AppHandlerAgent and PermissionManager
- Methods marked with (*) are abstract, requiring implementation by concrete subclasses
- Generic types are shown with tilde notation (e.g.,
List~AppInfo~representsList<AppInfo>) - Private members are prefixed with (-), while public members use (+)
Security Analysis
The implementation demonstrates strong security practices, particularly in permission management. Here's a visualization of the permission validation flow:
sequenceDiagram
participant U as User
participant OMI as OMI
participant PM as PermissionManager
participant App as AppHandlerAgent
U->>OMI: Execute Action
activate OMI
OMI->>PM: Validate Permission
activate PM
alt iOS Platform
PM->>PM: Check App Store Review Status
else Android Platform
PM->>PM: Verify Runtime Permissions
end
PM-->>OMI: Permission Valid
deactivate PM
OMI->>App: Execute Action
App-->>OMI: Success Response
OMI-->>U: Confirm Execution
deactivate OMI
Note over PM: Logs permission event
Key points about the permission flow diagram:
- Vertical rectangles show when each component is actively processing
- The alt/else section demonstrates platform-specific permission checks
- The logging note represents an asynchronous operation that occurs during permission validation
Recommendations for Enhancement
- Permission System Improvements```dart class PermissionManager { // Add permission scope tracking final Map<String, Set<PermissionScope>> appScopes = {};
Future
// Implement permission hierarchy validation
await _validateScopeHierarchy(scopes);
permissions[appName] = AppPermission(
appName: appName,
allowedActions: actions,
expiryDate: DateTime.now().add(Duration(days: 30)),
platform: Platform.operatingSystem,
scopes: scopes
);
}
Set<PermissionScope> _calculateRequiredScopes(List<String> actions) { return actions.map((action) => _getScopeForAction(action)).toSet(); } }
2. **Enhanced Error Handling**```dart
class AppHandlerException implements Exception {
final String message;
final Exception? originalError;
final ErrorType type;
AppHandlerException(this.message, {this.originalError, required this.type});
factory AppHandlerException.fromPlatform(String platformMessage) {
return AppHandlerException(platformMessage,
type: ErrorType.PLATFORM_ERROR);
}
}
enum ErrorType {
PLATFORM_ERROR,
PERMISSION_DENIED,
NETWORK_ERROR,
APP_UNAVAILABLE
}
- Security Enhancements - Implement rate limiting for API calls
- Add request signing for app communications
- Enhance logging with audit trails
- Implement secure storage for sensitive data
-
Testing Strategy Improvements```dart group('Permission Tests', () { test('permission validation handles expired permissions', () async { // Arrange final manager = PermissionManager(); await manager.registerAppPermission( 'test_app', ['action'], expiryDate: DateTime.now().subtract(Duration(days: 1)) );
// Act & Assert expect( await manager.validatePermission('test_app', 'action'), isFalse ); }); });
### Platform-Specific Considerations
1. **iOS Implementation** - Implement proper Face ID/Touch ID integration
- Handle App Store review requirements
- Manage sandbox restrictions
- Use Keychain for secure storage
2. **Android Implementation** - Implement runtime permission management
- Handle system-level permission checks
- Secure UI automation
- Use Android Keystore for cryptographic operations
### Performance Considerations
1. **Cache Management**```dart
class PermissionCache {
final Duration cacheDuration = const Duration(hours: 1);
final Map<String, CachedPermission> cache = {};
Future<bool> checkCachedPermission(String appName) async {
final cached = cache[appName];
if (cached != null &&
cached.timestamp.isAfter(DateTime.now().subtract(cacheDuration))) {
return cached.isValid;
}
// Refresh cache...
}
}
- Resource Optimization - Implement lazy loading for app handlers
- Use connection pooling for API calls
- Optimize permission validation queries
These enhancements maintain the system's core functionality while adding robust security measures, better error handling, and improved performance characteristics.
Yes @KodeByKalab . What should my next steps? Can we connect via discord?
@KodeByKalab Can i start working on it?
@kodjima33 Is my solution ok Can I start?
is this open? @kodjima33
/attempt #1728
Implementation plan 1=Goal When a user says “Book me a ride to home,” OMI automatically picks the best ride app based on location, installed apps, and user preferences, then opens it through a secure deep link
2= Scope
Region: USA first
Supported apps: Uber, Lyft, Curb, Via, Ztrip
Future-ready: Bolt, Grab, Ola, DiDi, InDriver
Core feature: Book ride (ETA, pricing, scheduling coming later
3=How It Works
When the user asks for a ride (like “Book me a ride to home”):
OMI understands the request and identifies it’s about booking a ride.
It checks if location access is allowed.
It looks at which ride apps are available in that region.
It chooses the best available app that fits the user’s settings.
It opens that app so the user can confirm and complete the ride.
4= User Experience
“Opening Uber to Home”
“Please enable Location or specify a destination.”
“No ride apps available in your region.”
Each case returns clear, helpful suggestions instead of errors.
5= Security & Privacy
All logic runs on-device no server or cloud data
Location: “While-in-use” permission only
Whitelisted app schemes (Uber, Lyft, etc.)
No personal data logged (addresses/coordinates)