omi Make omi work on macOS

Omi app Runs on macos perfectly, but when "try with phone microphone" button is clicked, nothing is being transcribed

what data it should capture

[ ] from microphone
[ ] system audio

Just like granola does it

/bounty $1000

-- thinh's comment: clarify the requirements https://github.com/BasedHardware/omi/issues/2010#issuecomment-2777470428

Mar 13 '25 22:03 kodjima33

💎 $500 bounty • omi

Steps to solve:

Start working: Comment /attempt #2010 with your implementation plan
Submit work: Create a pull request including /claim #2010 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

❗ Important guidelines:

To claim a bounty, you need to provide a short demo video of your changes in your pull request
If anything is unclear, ask for clarification before starting as this will help avoid potential rework
Low quality AI PRs will not receive review and will be closed
Do not ask to be assigned unless you've contributed before

Thank you for contributing to BasedHardware/omi!

Attempt	Started (UTC)	Solution	Actions
🟢 @cscoderr	Apr 01, 2025, 12:48:13 PM	#2141	Reward
🔴 @deekshatomer	Mar 16, 2025, 05:56:03 AM	WIP

Mar 13 '25 22:03 algora-pbc[bot]

The issue occurs because the package used for recording doesn't support macOS, which is why transcription isn't working. My plan is to either find an alternative package that supports macOS or add macOS support to the current package.

/attempt #2010

Options

Cancel my attempt

Mar 13 '25 23:03 cscoderr

/attempt #2010

Options

Cancel my attempt

Mar 16 '25 05:03 deekshatomer

I managed to get the recording working on macOS and made some configurations. Some were straightforward, just copying the existing iOS setup. However, Firebase needs to be configured specifically for macOS, and the bundle ID also requires configuration for other social integrations to work. Despite this, I was able to get it running using most of the iOS configuration. This is what I have on my end

https://github.com/user-attachments/assets/3d36559f-a86c-4fe4-a46e-3ccd92ba5ecd

@kodjima33

Mar 17 '25 13:03 cscoderr

https://github.com/BasedHardware/omi/pull/2045#issuecomment-2746817026

Mar 24 '25 03:03 beastoin

💡 @cscoderr submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Apr 01 '25 12:04 algora-pbc[bot]

@kodjima33 Is the current macOS UI okay, or would you like me to update it to make it look more like a native Mac app?

Apr 01 '25 13:04 cscoderr

Folks, I just want to clarify a bit.

Objective: The Omi AI app should work seamlessly with the audio system on macOS.

Key results:

Captures the audio system on macOS for the meetings use case.
Works on macOS with all core features: recording, transcribing, chat, apps.

References: https://www.granola.ai app

Tips: Check all references, and make sure you ask questions to clarify everything about the descriptions (a.k.a. the requirements) before jumping to the implementation.

@cscoderr @deekshatomer

Apr 04 '25 03:04 beastoin

Increasing bounty to $1,000

Apr 22 '25 02:04 kodjima33

Can I get this assigned to me?

Apr 22 '25 07:04 Wolfof420Street

@beastoin @kodjima33 My previous implementation addressed the objective: "The Omi AI app should work seamlessly with the audio system on macOS." However, I got a bit confused when you mentioned integrating macOS-specific UI. If that's still part of the objective, I’m happy to reopen it and make the necessary updates—just let me know

Apr 22 '25 08:04 cscoderr

Some thoughts - we first need to capture audio streams from both 1) microphone and 2) system audio in macOS. No need for opus encoding here - we can send the pcm stream to the transcription service. For maximum accuracy, we should run each stream (i.e. "me" audio and "them" audio) independently, then merge (this is how Granola works). This means that there will be 2 websocket connections w/ Deepgram endpoint and therefore transcription cost will be doubled. Now, there is an alternative - we can also consider running transcription locally via whisper model (https://github.com/argmaxinc/WhisperKit is excellent). Whisper-v3-large should be "good enough" for most conversations in English and other major languages.

@beastoin wdyt?

Apr 30 '25 00:04 moona3k

#2443 App on testflight, records both system + mic audio

Jun 01 '25 22:06 mdmohsin7

still not done

Jun 03 '25 02:06 kodjima33