omi reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe

reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe

Open 0xzre opened this issue 1 year ago • 1 comments

#518 The encoding in Friend firmware code v1.0 shows that it's using frame size of 160 (10ms). I have not tested on Friend cause I don't have the device. Changing the real-time resolution to standard to 20ms, should theoretically reduce server load. Thank you!

Aug 19 '24 11:08 0xzre

It still doesn't work, there's no transcript. Also there's this warning and I am not sure if it is something to be worried about?

backend/routers/transcribe.py:102: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:1530.)
  samples = torch.frombuffer(decoded_opus, dtype=torch.int16).float() / 32768.0

That is the error that is related how I should handle the buffer in Opus, and I'll solve that soon.

Aug 20 '24 01:08 0xzre

Sounds like server get heavier. Miss more transcribe & slower -> Incoming bytes take long time to process on VAD, increasing delay to DG. Websockets more dc -> Ping/pong doesn't get though or processed on time, because high cpu usage on VAD

My solution :

Use onnx runtime for VAD
Decrease window for VAD, 4x lesser now

Any feedback or opinion is appreciated. Thanks!

Aug 21 '24 20:08 0xzre

Changes

More handling on socket2 data, which is always used when Opus (Friend mic, not phone) is involved. It target to solve socket disconnected err, while keeping the PCM still working. Any feedback is welcomed, thank you :)

Aug 22 '24 19:08 0xzre

@josancamon19 @mdmohsin7 Already merged with main branch, giving better result on case of using speech profile. Please review, thanks!

Aug 25 '24 11:08 0xzre

https://share.icloud.com/photos/06dFrjm9Q_RrsvZO5VLScWGLg

Clearly doesn't work, for next review, please submit videos of it working through the app

Aug 28 '24 22:08 josancamon19

@josancamon19 @mdmohsin7 Drive link: https://drive.google.com/drive/folders/1h1nbyLAaVt72Wwy-yO_5C8L5_re17ptI?usp=sharing Please review thanks!

Aug 31 '24 06:08 0xzre

I have added more testing, which now is for a lecture video (more convertation alike situation) in "test 1" folder. also provided the pcm transcribe from playstore app (no VAD) for the ground truth. The result is, the latency is indistinguishable, accuracy very improved. VAD opus usable

Sep 01 '24 18:09 0xzre

dude @josancamon19

Sep 13 '24 03:09 0xzre

Moving PR to https://github.com/BasedHardware/omi/pull/922

Sep 26 '24 18:09 josancamon19

omi omi copied to clipboard

reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe

Changes

omi
omi copied to clipboard