omi icon indicating copy to clipboard operation
omi copied to clipboard

Make a demo of device speaking ($2000)

Open kodjima33 opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. People want to talk to the device. Our v2 device is equipped with a speaker but doesn't speak yet

Describe the solution you'd like Solution should be exactly like our in-app chat but with audio from device. For example, I click on the button (on device) and ask "hey what's the capital of united states" and device should respond with audio "Washington DC". This response and prompt should be visible on the chat inside of the app

This functionality should be disablable via settings.

Additional context There was a PR submitted a while ago to make the app speak, take a look at that

This is a paid task. Reward is $2,000 in cash. Simply link your PR to this issue to receive the mo eh

kodjima33 avatar Oct 08 '24 17:10 kodjima33

sweet!

beastoin avatar Oct 09 '24 01:10 beastoin

I can get this working but would be using Deepgram only.

DamienDeepgram avatar Oct 11 '24 05:10 DamienDeepgram

Go a head with a PR pls @DamienDeepgram , you don't need a permission to do great things.

beastoin avatar Oct 12 '24 01:10 beastoin

@hoai265 you too - great firmware dev man

beastoin avatar Oct 12 '24 01:10 beastoin

Sorry guys had to update the bounty

kodjima33 avatar Oct 14 '24 02:10 kodjima33

so sad, no one take this :dead: too sweet $1K

beastoin avatar Oct 22 '24 03:10 beastoin

Hi @kodjima33 , I’d love to take this task. Please assign it to me, and I’ll get started. Thanks!

Sanchay-T avatar Oct 25 '24 03:10 Sanchay-T

@Sanchay-T yes, pls keep us updated! every 24h would be great ~ just to keep your motivation up!

beastoin avatar Oct 26 '24 04:10 beastoin

Hi @beastoin 👋

I've been diving into the codebase to understand how we can implement voice responses for the device. Really interesting architecture you've built here! While I'm familiar with backend systems, I'm getting up to speed with some of the Flutter and BLE specifics.

Looking at the current audio pipeline, I can see we're handling real-time streaming through WebSockets pretty elegantly. The socket service in app/lib/services/sockets.dart seems to be the core of this:

Future<TranscriptSegmentSocketService?> socket({
    required BleAudioCodec codec,
    required int sampleRate,
    required String language,
    bool force = false,
}) async {

For implementing the voice response feature, I think we can build on this foundation. I see we're already integrated with OpenAI's APIs in backend/http/openai.dart, which could be extended for text-to-speech capabilities.

I have a few questions about the device interaction part though. Looking at app/lib/services/devices/models.dart, I see how we're handling BLE characteristics:

BluetoothCharacteristic? getCharacteristicByUuid(BluetoothService service, String uuid) {
  return service.characteristics.firstWhereOrNull(
    (characteristic) => characteristic.uuid.str128.toLowerCase() == uuid.toLowerCase(),
  );
}

Before I proceed with the implementation, I wanted to check:

  1. For handling button presses - what would be the best way to detect when the user wants to trigger a voice command? Should we use an existing characteristic or define a new one?

  2. Regarding audio playback through the device's speaker - are there any specific format requirements or limitations I should be aware of?

  3. For the chat interface, I see we're using the Memory system to handle conversations. Would adding voice responses require any significant changes to the current schema?

I have some ideas about the implementation, but wanted to validate these core aspects first to make sure I'm heading in the right direction. Happy to elaborate on any part of this!

Thanks for the help! Looking forward to your insights.

Sanchay-T avatar Oct 27 '24 12:10 Sanchay-T

1/ what do your propose ? pros / cons. 2/ @kevvz could help ? but you should try it yourself first. 3/ just do it (to know that you're wrong 😏) no worries man, be creative. let's finish the first draft quickly then we have something to discuss. embracing the changes( good changes :))

@Sanchay-T

beastoin avatar Oct 28 '24 03:10 beastoin

Deepgram also has TTS so you could use the same sdk I think that the speech to text is using. Not sure if Omi has a preference there tho

see: https://pub.dev/packages/deepgram_speech_to_text#text-to-speech

DamienDeepgram avatar Oct 31 '24 16:10 DamienDeepgram

Removed @Sanchay-T from assigned - no progress

@beastoin let's try to not assign people if they didn't yet have PRs submitted. We assign only to those who had PRs. Others will need to do a PR first.

@DamienDeepgram try it out bro - looking forward!

kodjima33 avatar Nov 01 '24 22:11 kodjima33

Hi @beastoin and @kodjima33

I wanted to clarify the situation regarding my previous assignment. First, I apologize for the delay in updates - I was away for Diwali celebrations in my hometown, which affected my response time. However, I want to assure you that I've been actively working on this in the background:

  1. I've been going through the codebase thoroughly, particularly focusing on the audio pipeline and BLE integration
  2. While I have less experience with Flutter/Dart specifically, I bring relevant experience with speech/text models which I believe will be valuable for this feature
  3. I'm currently working on implementing a proof-of-concept to address the questions I raised earlier, particularly around:
    • Button press handling for voice command triggering
    • Audio playback implementation
    • Memory system integration for voice responses

I understand the policy about assignments and PRs, and I'm committed to submitting a PR with my implementation soon. I should have communicated my temporary absence better, and I appreciate your patience.

Would it be alright if I continue working on this feature and submit a PR for review? I'm happy to share my current progress in more detail if helpful.

Thanks for understanding!

Sanchay-T avatar Nov 02 '24 02:11 Sanchay-T

checking speaker functions of current firmware...

beastoin avatar Nov 02 '24 04:11 beastoin

Removed @Sanchay-T from assigned - no progress

@beastoin let's try to not assign people if they didn't yet have PRs submitted. We assign only to those who had PRs. Others will need to do a PR first.

@DamienDeepgram try it out bro - looking forward!

Hi @kodjima33 What if we add a separate option to route audio to the phone's output, like AirPods? I think this would be another option for users, as they could listen privately.

hoai265 avatar Nov 02 '24 04:11 hoai265

@DamienDeepgram don't forget to ref your PR ;)

about the preference to implement this task, be creative. but i think Nik's description is good/simple enough to roll out l the first draft. smth likes ~

1/ the user press the button in the device and say something 2/ the device send that voice to the app 3/ the app send the voice message to the backend 4/ the backend process the voice message then response to the app with audio bytes 5/ the app send the audio bytes to the device 6/ the device speak it out loud.

hope that helps.

beastoin avatar Nov 02 '24 04:11 beastoin

speaker should support playback over BT, i dont might looking into this after apple watch PR as it will use a similar two-way transport.

vincentkoc avatar Nov 02 '24 08:11 vincentkoc

I have started on this, but need to get some sleep #1243 - i think even before i flashed new firmware any button click on my (red) devkit2 causes a fatal crash - not sure if the shipped devkit2's have a different setup with button? Could be a different pin/setup for the button causing this issue.

Code in WIP includes all the BLE setup to stream and handle the stream on desktop side (Python). Once finalised can move to dart code.

vincentkoc avatar Nov 03 '24 14:11 vincentkoc

@Sanchay-T bro no worries, just keep building this and try to make it work. No one blames you - it's just we assign issues only after first PR

@vincentkoc @DamienDeepgram guys I believe in you. Let's make this work! (ideally today)

You are both working on this, if you both make it work, I'll make a post about both of you and we will solve the bounty issue

kodjima33 avatar Nov 03 '24 22:11 kodjima33

fighting 💪

beastoin avatar Nov 04 '24 01:11 beastoin

@DamienDeepgram don't forget to ref your PR ;)

Sorry yes here is the PR with the issue with playback not streaming correctly

https://github.com/BasedHardware/omi/pull/1246

DamienDeepgram avatar Nov 05 '24 17:11 DamienDeepgram

Hey @kodjima33 @beastoin Is this issue been resolved? I've gone through discussions and am willing to develop this, may I proceed?

ombhojane avatar Nov 19 '24 22:11 ombhojane

Please go a head and keep us updated @ombhojane

beastoin avatar Nov 19 '24 23:11 beastoin

Sure @beastoin

ombhojane avatar Nov 20 '24 04:11 ombhojane

@beastoin I'm stucked at setup the project. With Omi's instructions, I did setup, at last stage it was building android gradle files, it downloaded more data than expected. So more time was gone in setting up the things, and still figuring out. Need to see what's going on and how to fix.

ombhojane avatar Nov 20 '24 15:11 ombhojane

So how do we integrate this feature if we do not have Omi dev kit devices? Or, can we implement this feature in our Android/iOS device, and if it works there, then it works with OMI devices?

himmat12 avatar Nov 23 '24 01:11 himmat12

@himmat12 you already know the answer man. the ticket title is super clear.

beastoin avatar Nov 23 '24 01:11 beastoin

@ombhojane how's it going?

if you want to get this tiket done - building the app / the firmware is a basic requirement.

beastoin avatar Nov 23 '24 01:11 beastoin

Hey @beastoin I'll figure this out today, yesterday was my exam. I'll try manual installation once.

ombhojane avatar Nov 23 '24 04:11 ombhojane

Hii @beastoin I've set up the Omi. Now I'm looking to fix the issue, I'll update the progress

ombhojane avatar Nov 26 '24 06:11 ombhojane