omi icon indicating copy to clipboard operation
omi copied to clipboard

record with voice

Open kodjima33 opened this issue 9 months ago • 24 comments

This feature is just too good not to be added

Please add "record with voice" button

Image

/bounty $100

kodjima33 avatar Mar 05 '25 07:03 kodjima33

💎 $1,000 bounty • omi

Steps to solve:

  1. Start working: Comment /attempt #1953 with your implementation plan
  2. Submit work: Create a pull request including /claim #1953 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to BasedHardware/omi!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🔴 @Gyan-max Mar 8, 2025, 1:45:55 PM WIP
🟢 @Hashino Mar 16, 2025, 11:54:23 AM WIP
🟢 @neooriginal Mar 17, 2025, 11:49:26 AM #2043
🟢 @AVtheking #2046

algora-pbc[bot] avatar Mar 05 '25 07:03 algora-pbc[bot]

@kodjima33 @algora-pbc
I’d like to work on this issue and wanted to clarify a few things:

  1. Is the bounty for this issue still open?
  2. What exactly does "disable the consumer device" mean?
    • Should the device be permanently disabled, or should it be reversible?
    • Is there an API endpoint or function already handling device activation that we should modify?
    • Does disabling mean preventing further usage, turning off remotely, or something else?

Looking forward to your response. Thanks!

kotesh-arya avatar Mar 06 '25 18:03 kotesh-arya

/claim #1953 The user is asking for a brief implementation plan for the GitHub issue #1953: "record with voice" feature. Based on the files I've been shown and the prior conversation, I can see that we've already implemented much of this feature.

Here's what we've implemented so far:

  1. Created a new VoiceRecordingPage for recording audio with the phone's microphone
  2. Added methods to the CaptureProvider to process phone recordings
  3. Added a 'phone' source type to the ConversationSource enum
  4. Updated the getTag() method to display "Phone Recording" for phone-recorded conversations
  5. Added a floating action button to the home page to access the voice recording feature

My implementation plan should summarize these steps and outline what else needs to be done to complete this feature.# Implementation Plan: Voice Recording Feature

Here's my plan to implement the voice recording feature (Issue #1953):

  1. Create VoiceRecordingPage

    • Create a new page with UI for recording audio using the phone's microphone
    • Include timer display, recording status indicator, and start/stop controls
    • Implement permissions handling for microphone access
  2. Update CaptureProvider

    • Add processPhoneRecording method to handle audio files recorded with the phone
    • Implement helper methods for creating and finalizing conversations with phone recordings
    • Send recorded audio to the server for transcription via WebSocket
  3. Update Conversation Schema

    • Add 'phone' to the ConversationSource enum to distinguish phone recordings
    • Update getTag() method to display "Phone Recording" for phone-recorded conversations
  4. Add UI Access Point

    • Add a floating action button to the home page for quick access to voice recording
    • Use red color and microphone icon to make it visually distinct
  5. Integration Testing

    • Test the complete flow from recording to transcription to conversation display
    • Verify proper error handling for permissions and audio processing
    • Confirm recordings are properly categorized as "Phone Recording" in the UI

This implementation makes recording directly with the phone easy and intuitive, while reusing the existing backend infrastructure for transcription and conversation management.

Options

Gyan-max avatar Mar 08 '25 13:03 Gyan-max

@Gyan-max pls never come back again here with your AI shit

kodjima33 avatar Mar 13 '25 20:03 kodjima33

/attempt #1953

Options

Hashino avatar Mar 16 '25 11:03 Hashino

Increasing bounty to $200

/bounty $200

kodjima33 avatar Mar 17 '25 06:03 kodjima33

Increasing bounty to $1k if I get a PR today that works like a blast /bounty $1000

Important:

I need you to add "record with voice" icon, just like in chatgpt, that will listen to the voice from microphone. Once stopped,

Using the existing code that works on "Try with Phone Mic" button - it should receive a transcription and paste it in the chat

Try not to create any new variables or modules or components

Chat page: Image

kodjima33 avatar Mar 17 '25 08:03 kodjima33

working on it, will try to get it done by today

neooriginal avatar Mar 17 '25 09:03 neooriginal

/attempt #1953

neooriginal avatar Mar 17 '25 11:03 neooriginal

@neooriginal amazing but can you make it a little bit more like this pls?

https://github.com/user-attachments/assets/5b8d77db-db4e-453a-a005-2a4fe9ad9721

@AVtheking can you share demo?

kodjima33 avatar Mar 17 '25 20:03 kodjima33

@neooriginal amazing but can you make it a little bit more like this pls?

ScreenRecording_03-17-2025.13-23-18_1.MP4 @AVtheking can you share demo?

done

neooriginal avatar Mar 17 '25 22:03 neooriginal

Using the existing code that works on "Try with Phone Mic" button - it should receive a transcription and paste it in the chat

folks, ensure you fully understand the requirements, or if you want to propose a better solution but doesn't fully align with the requirement, let talk to Nik to clarify it.

make sure we are on the same page.

for example, @neooriginal did you use a new STT library, not based on the current Try with Phone Mic ?

beastoin avatar Mar 18 '25 01:03 beastoin

Hi @beastoin I tried implementing it by following the requirement, could you please take a look if the approach is correct . I will upload a demo video in a while , have to give exam in few hours 😅

AVtheking avatar Mar 18 '25 02:03 AVtheking

I talked to Nik via telegram already. I can easily implement the backend stt. On device is superior though:

  1. way faster
  2. cheaper for you
  3. it does not have to be 1to1 accurate like in text messages because AI can interpret it
  4. apple one works fine since apple intelligence for me
  5. works offline
  6. id say even better then deepgram on android/pixel devices

-------- Ursprüngliche Nachricht -------- Am 18.03.25 11:49 um Thinh schrieb :

Using the existing code that works on "Try with Phone Mic" button - it should receive a transcription and paste it in the chat

folks, ensure you fully understand the requirements, or if you want to propose a better solution but doesn't fully align with the requirement, let talk to Nik to clarify it.

make sure we are on the same page.

for example, @.***(https://github.com/neooriginal) did you use a new STT library, not based on the current Try with Phone Mic ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

[beastoin]beastoin left a comment (BasedHardware/omi#1953)

Using the existing code that works on "Try with Phone Mic" button - it should receive a transcription and paste it in the chat

folks, ensure you fully understand the requirements, or if you want to propose a better solution but doesn't fully align with the requirement, let talk to Nik to clarify it.

make sure we are on the same page.

for example, @.***(https://github.com/neooriginal) did you use a new STT library, not based on the current Try with Phone Mic ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

neooriginal avatar Mar 18 '25 02:03 neooriginal

@neooriginal tell me more about 3. 6., i doubted that the on-device STT is better than DG

no worries, Nik and I are on the same page, if you have talked to him and he's okay with it - it's fine with me.

@AVtheking haha you should be faster or, Neo could take this piece of cake, looking forward.

beastoin avatar Mar 18 '25 02:03 beastoin

@neooriginal tell me more about 3. 6., i doubted that the on-device STT is better than DG

no worries, Nik and I are on the same page, if you have talked to him and he's okay with it - it's fine with me.

@AVtheking haha you should be faster or, Neo could take this piece of cake, looking forward.

Nik did not respond to me yet. He didn't seem totally against it though. Try it out yourself and if it's bad I can change it with 5 mins. I'm going to hurry up, need the cash

neooriginal avatar Mar 18 '25 02:03 neooriginal

@neooriginal man, could you do it yourself ? if you want to take this ticket, you must be super strong on your proposed solution.

tips: tell me more about the 3. 6. based on your research, focus on the WER, and, use these on-table research findings to discuss with Nik.

it also great if you could try 2 approaches and compare them yourself.

beastoin avatar Mar 18 '25 02:03 beastoin

@neooriginal man, could you do it yourself ? if you want to take this ticket, you must be super strong on your proposed solution.

tips: tell me more about the 3. 6. based on your research, focus on the WER, and, use these on-table research findings to discuss with Nik.

it also great if you could try 2 approaches and compare them yourself.

if i take this ticket and deliver good results will i definetly be selected and get the money? I would present my research but i do not want others to claim the ticket then. Quite the time pressure right now.

im working on just implementing backend stt. probably easier

neooriginal avatar Mar 18 '25 03:03 neooriginal

ok everything is working. ready to review

neooriginal avatar Mar 18 '25 03:03 neooriginal

Hi folks,

  • Requirement updates: The top priority for this feature is to make it work with unstable internet. So, if I record for 10 minutes and the internet stops working for 5 minutes, and then once I stop recording, I connect to the internet, it should process the audio fully. Ref: https://github.com/BasedHardware/omi/pull/2043#issuecomment-2735452559
  • Suggested solutions: https://github.com/BasedHardware/omi/pull/2043#issuecomment-2736188805 / https://github.com/BasedHardware/omi/pull/2043#issuecomment-2736281965

beastoin avatar Mar 19 '25 11:03 beastoin

@beastoin are you finishing this off , or could I try it ?

AVtheking avatar Mar 20 '25 11:03 AVtheking

@beastoin are you finishing this off , or could I try it ?

seems like he does: https://github.com/BasedHardware/omi/pull/2055/commits/9c82db689c69fd2e5cd32c707fba6465c11fadeb

neooriginal avatar Mar 20 '25 11:03 neooriginal

sorry guys, the requirement changes make it harder for you to finish this ticket in the limited time.

we need to move fast so i have handled it by myself #2055

feel free to read it, and create new PR to enhance it if needed.

if you need some coffee to keep your caffeine level always high - feel free to ping me https://discord.omi.me @thinh

thank you for your time.

@Hashino @neooriginal @AVtheking

beastoin avatar Mar 21 '25 03:03 beastoin

Product Change Logs

  1. Feature is ready on TestFlight / internal test

@kodjima33 congratulations 🚀

beastoin avatar Mar 21 '25 05:03 beastoin