omi icon indicating copy to clipboard operation
omi copied to clipboard

Make omi work on macOS

Open kodjima33 opened this issue 9 months ago • 12 comments

Omi app Runs on macos perfectly, but when "try with phone microphone" button is clicked, nothing is being transcribed

what data it should capture

  • [ ] from microphone
  • [ ] system audio

Just like granola does it

/bounty $1000

Image

-- thinh's comment: clarify the requirements https://github.com/BasedHardware/omi/issues/2010#issuecomment-2777470428

kodjima33 avatar Mar 13 '25 22:03 kodjima33

💎 $500 bounty • omi

Steps to solve:

  1. Start working: Comment /attempt #2010 with your implementation plan
  2. Submit work: Create a pull request including /claim #2010 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

❗ Important guidelines:

  • To claim a bounty, you need to provide a short demo video of your changes in your pull request
  • If anything is unclear, ask for clarification before starting as this will help avoid potential rework
  • Low quality AI PRs will not receive review and will be closed
  • Do not ask to be assigned unless you've contributed before

Thank you for contributing to BasedHardware/omi!

Attempt Started (UTC) Solution Actions
🟢 @cscoderr Apr 01, 2025, 12:48:13 PM #2141 Reward
🔴 @deekshatomer Mar 16, 2025, 05:56:03 AM WIP

algora-pbc[bot] avatar Mar 13 '25 22:03 algora-pbc[bot]

The issue occurs because the package used for recording doesn't support macOS, which is why transcription isn't working. My plan is to either find an alternative package that supports macOS or add macOS support to the current package.

/attempt #2010

Options

cscoderr avatar Mar 13 '25 23:03 cscoderr

/attempt #2010

Options

deekshatomer avatar Mar 16 '25 05:03 deekshatomer

I managed to get the recording working on macOS and made some configurations. Some were straightforward, just copying the existing iOS setup. However, Firebase needs to be configured specifically for macOS, and the bundle ID also requires configuration for other social integrations to work. Despite this, I was able to get it running using most of the iOS configuration. This is what I have on my end

https://github.com/user-attachments/assets/3d36559f-a86c-4fe4-a46e-3ccd92ba5ecd

@kodjima33

cscoderr avatar Mar 17 '25 13:03 cscoderr

https://github.com/BasedHardware/omi/pull/2045#issuecomment-2746817026

beastoin avatar Mar 24 '25 03:03 beastoin

💡 @cscoderr submitted a pull request that claims the bounty. You can visit your bounty board to reward.

algora-pbc[bot] avatar Apr 01 '25 12:04 algora-pbc[bot]

@kodjima33 Is the current macOS UI okay, or would you like me to update it to make it look more like a native Mac app?

cscoderr avatar Apr 01 '25 13:04 cscoderr

Folks, I just want to clarify a bit.

Objective: The Omi AI app should work seamlessly with the audio system on macOS.

Key results:

  1. Captures the audio system on macOS for the meetings use case.
  2. Works on macOS with all core features: recording, transcribing, chat, apps.

References: https://www.granola.ai app

Tips: Check all references, and make sure you ask questions to clarify everything about the descriptions (a.k.a. the requirements) before jumping to the implementation.

@cscoderr @deekshatomer

beastoin avatar Apr 04 '25 03:04 beastoin

Increasing bounty to $1,000

kodjima33 avatar Apr 22 '25 02:04 kodjima33

Can I get this assigned to me?

Wolfof420Street avatar Apr 22 '25 07:04 Wolfof420Street

@beastoin @kodjima33 My previous implementation addressed the objective: "The Omi AI app should work seamlessly with the audio system on macOS." However, I got a bit confused when you mentioned integrating macOS-specific UI. If that's still part of the objective, I’m happy to reopen it and make the necessary updates—just let me know

cscoderr avatar Apr 22 '25 08:04 cscoderr

Some thoughts - we first need to capture audio streams from both 1) microphone and 2) system audio in macOS. No need for opus encoding here - we can send the pcm stream to the transcription service. For maximum accuracy, we should run each stream (i.e. "me" audio and "them" audio) independently, then merge (this is how Granola works). This means that there will be 2 websocket connections w/ Deepgram endpoint and therefore transcription cost will be doubled. Now, there is an alternative - we can also consider running transcription locally via whisper model (https://github.com/argmaxinc/WhisperKit is excellent). Whisper-v3-large should be "good enough" for most conversations in English and other major languages.

@beastoin wdyt?

moona3k avatar Apr 30 '25 00:04 moona3k

#2443 App on testflight, records both system + mic audio

mdmohsin7 avatar Jun 01 '25 22:06 mdmohsin7

still not done

kodjima33 avatar Jun 03 '25 02:06 kodjima33