omi icon indicating copy to clipboard operation
omi copied to clipboard

reintroduce opus on VAD, change frame size according to firmware v1.0, change realtime resolution for transcribe

Open 0xzre opened this issue 1 year ago • 1 comments

#518 The encoding in Friend firmware code v1.0 shows that it's using frame size of 160 (10ms). I have not tested on Friend cause I don't have the device. Changing the real-time resolution to standard to 20ms, should theoretically reduce server load. Thank you!

0xzre avatar Aug 19 '24 11:08 0xzre

It still doesn't work, there's no transcript. Also there's this warning and I am not sure if it is something to be worried about?

backend/routers/transcribe.py:102: UserWarning: The given buffer is not writable, and PyTorch does not support non-writable tensors. This means you can write to the underlying (supposedly non-writable) buffer using the tensor. You may want to copy the buffer to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:1530.)
  samples = torch.frombuffer(decoded_opus, dtype=torch.int16).float() / 32768.0

That is the error that is related how I should handle the buffer in Opus, and I'll solve that soon.

0xzre avatar Aug 20 '24 01:08 0xzre

Sounds like server get heavier. Miss more transcribe & slower -> Incoming bytes take long time to process on VAD, increasing delay to DG. Websockets more dc -> Ping/pong doesn't get though or processed on time, because high cpu usage on VAD

My solution :

  • Use onnx runtime for VAD
  • Decrease window for VAD, 4x lesser now

Any feedback or opinion is appreciated. Thanks!

0xzre avatar Aug 21 '24 20:08 0xzre

Changes

  • More handling on socket2 data, which is always used when Opus (Friend mic, not phone) is involved. It target to solve socket disconnected err, while keeping the PCM still working. Any feedback is welcomed, thank you :)

0xzre avatar Aug 22 '24 19:08 0xzre

@josancamon19 @mdmohsin7 Already merged with main branch, giving better result on case of using speech profile. Please review, thanks!

0xzre avatar Aug 25 '24 11:08 0xzre

https://share.icloud.com/photos/06dFrjm9Q_RrsvZO5VLScWGLg

Clearly doesn't work, for next review, please submit videos of it working through the app

josancamon19 avatar Aug 28 '24 22:08 josancamon19

@josancamon19 @mdmohsin7 Drive link: https://drive.google.com/drive/folders/1h1nbyLAaVt72Wwy-yO_5C8L5_re17ptI?usp=sharing Please review thanks!

0xzre avatar Aug 31 '24 06:08 0xzre

I have added more testing, which now is for a lecture video (more convertation alike situation) in "test 1" folder. also provided the pcm transcribe from playstore app (no VAD) for the ground truth. The result is, the latency is indistinguishable, accuracy very improved. VAD opus usable

0xzre avatar Sep 01 '24 18:09 0xzre

dude @josancamon19

0xzre avatar Sep 13 '24 03:09 0xzre

Moving PR to https://github.com/BasedHardware/omi/pull/922

josancamon19 avatar Sep 26 '24 18:09 josancamon19