silero-vad icon indicating copy to clipboard operation
silero-vad copied to clipboard

Mobile / Edge / ARM / ONNX Use Cases

Open snakers4 opened this issue 4 years ago • 23 comments

While the VAD (especially the micro one) was explicitly designed for IOT / edge / mobile use cases, we do not have the resource or expertise to provide instructions for corresponding ARM / mobile builds for PyTorch and / or ONNX.

ONNX guides were refurbished recently and it is implied that ARM binaries will be made available (but they are not yet).

People from the community (see telegram chat) have also claimed successful builds and use of silero-models on pytorch replacing mkl with cblas.

In any case sharing such dockerized builds (e.g. based off debian / ubuntu / alpine) for your tested used cases will be of great value for the community, PRs greatly encouraged and appreciated.

Please see some examples here - https://github.com/microsoft/onnxruntime/blob/master/dockerfiles/README.md#arm-32v7

If you feel like doing something like this - please provide a build in a dockerfile and provide some background info on which arch / device / processor you are running it, if this hardware is generally available, what is the end performance etc

snakers4 avatar Feb 20 '21 06:02 snakers4

Some ARM builds here https://github.com/snakers4/silero-models/issues/70#issuecomment-823866940

snakers4 avatar Apr 22 '21 04:04 snakers4

Builds for armv7

Newo6174 avatar Jun 10 '21 07:06 Newo6174

Nice, this guy bothered to create a Dockerfile!

snakers4 avatar Jun 10 '21 07:06 snakers4

Tract, written in Rust, can easily be compiled to ARM devices, and streaming audio on edge devices is one of their use-cases. I tried running it but there's an unimplemented op. Support for it is being tracked on above mentioned issue.

gstvg avatar Jul 29 '21 19:07 gstvg

Running silero-vad on Android - https://github.com/bgubanov/VadExample

You need to use this model - https://github.com/snakers4/silero-vad/blob/master/files/model_micro_mobile.jit

snakers4 avatar Sep 06 '21 12:09 snakers4

Will keep this issue pinned for everyone to see

snakers4 avatar Sep 16 '21 03:09 snakers4

I’m looking to do an iOS version.

  1. Can I assume the .jit model file is created by TorchScript so I’d follow the instructions for how to use PyTorch for iOS (which is currently requiring C++)? It does seem like there’s no convention for the file suffix but it seems .pt is actually preferred after a quick search.
  2. How did you pretrain the models? I couldn’t immediately see any source code that produces that file.

boxabirds avatar Oct 02 '21 08:10 boxabirds

Can I assume the .jit model file is created by TorchScript so I’d follow the instructions for how to use PyTorch for iOS (which is currently requiring C++)? It does seem like there’s no convention for the file suffix but it seems .pt is actually preferred after a quick search.

The .jit models are indeed created using TorchScript. A common problem with running them on mobile was STFT (lack of Intel's MKL or something), which was fixed with this micro model - https://github.com/snakers4/silero-vad/blob/master/files/model_micro_mobile.jit. We are planning a large update soon, where all of the models would not have this problem.

People actually successfully managed to run the model on Android - https://github.com/snakers4/silero-vad/issues/37#issuecomment-913629296

Since PyTorch also has packages now and we created this project some time ago, we reserved the more obvious names - .jit for TorchScript, .onnx for ONNX. In hindsight probably none of this really matters.

How did you pretrain the models?

Using our internal datasets and algorithms. The public models now feature 4 or 5 languages and have some slight issues with post-processing parameter fine-tuning. Most likely, next version of the models will be simplified (only 1M param model and 10k param model, maybe for 16k and 8k, most likely in JIT and ONNX), will use ~100 languages and be easier to use.

I couldn’t immediately see any source code that produces that file.

We decided against sharing our pipelines since we consider our VAD as a domain agnostic solution as opposed to yet another toolkit. If some tweaks or optimizations are required - we do them as commercial projects.

snakers4 avatar Oct 02 '21 08:10 snakers4

Great thanks. BTW your original web article’s source code link is broken.

boxabirds avatar Oct 02 '21 08:10 boxabirds

Which of the articles? Can you send the link / tell which exact link is broken?

snakers4 avatar Oct 02 '21 09:10 snakers4

This one https://towardsdatascience.com/modern-portable-voice-activity-detector-released-417e516aadde#:~:text=webrtc%20though%20starts%20to%20show%20its%20age%20and%20it%20suffers%20from%20many%20false%20positives.

And the original one this is syndicated from

boxabirds avatar Oct 02 '21 09:10 boxabirds

For some reason the English articles has a broken link to TorchHub Fixed it, many thanks

snakers4 avatar Oct 02 '21 09:10 snakers4

HI, @snakers4 as you pointed, the .jit actully same with .pt. But when i change the extension file name from .jit to .pt, I found the Netron can not visualize the model.pt file. Very confused? thank you.

image

dragen1860 avatar Oct 19 '21 02:10 dragen1860

HI, @snakers4 as you pointed, the .jit actully same with .pt. But when i change the extension file name from .jit to .pt, I found the Netron can not visualize the model.pt file. Very confused? thank you.

There is no real convention for these new things, i.e. TorchScript models (jit) or Torch packages (pt). Since this piece of software (Netron) was probably envisaged and written 1-2 years ago, most likely they faced the same question and decided to keep this distinction as well.

snakers4 avatar Oct 19 '21 05:10 snakers4

The new VAD model in v3.0 release should be compatible with all versions of PyTorch (mobile, ARM, x86, etc) since it does not:

  • Use built-in FFT
  • Due to small model size we decided to publish a single non-quantized version of the model

snakers4 avatar Dec 07 '21 10:12 snakers4

Some interesting comments on this topic worthy of sharing with the general public:

  • https://github.com/snakers4/silero-vad/discussions/126#discussioncomment-1778341
  • https://github.com/snakers4/silero-vad/discussions/126#discussioncomment-1768924
  • https://twitter.com/sepia_fw/status/1468582438435762179

snakers4 avatar Dec 10 '21 05:12 snakers4

The new release has an ONNX model, albeit only for 16 kHz - https://github.com/snakers4/silero-vad/releases/tag/v3.1

snakers4 avatar Dec 17 '21 15:12 snakers4

Can I assume the .jit model file is created by TorchScript so I’d follow the instructions for how to use PyTorch for iOS (which is currently requiring C++)? It does seem like there’s no convention for the file suffix but it seems .pt is actually preferred after a quick search.

The .jit models are indeed created using TorchScript. A common problem with running them on mobile was STFT (lack of Intel's MKL or something), which was fixed with this micro model - https://github.com/snakers4/silero-vad/blob/master/files/model_micro_mobile.jit. We are planning a large update soon, where all of the models would not have this problem.

People actually successfully managed to run the model on Android - #37 (comment)

Since PyTorch also has packages now and we created this project some time ago, we reserved the more obvious names - .jit for TorchScript, .onnx for ONNX. In hindsight probably none of this really matters.

How did you pretrain the models?

Using our internal datasets and algorithms. The public models now feature 4 or 5 languages and have some slight issues with post-processing parameter fine-tuning. Most likely, next version of the models will be simplified (only 1M param model and 10k param model, maybe for 16k and 8k, most likely in JIT and ONNX), will use ~100 languages and be easier to use.

I couldn’t immediately see any source code that produces that file.

We decided against sharing our pipelines since we consider our VAD as a domain agnostic solution as opposed to yet another toolkit. If some tweaks or optimizations are required - we do them as commercial projects.

Hello, when I use the new models Running silero-vad on Android - https://github.com/bgubanov/VadExample , there are still errors about "RuntimeError: stft: ATen not compiled with MKL support". Is there a solution here?

uloveqian2021 avatar Dec 21 '21 06:12 uloveqian2021

errors about "RuntimeError: stft: ATen not compiled with MKL support". Is there a solution here?

@uloveqian2021, you are using the new V3 model, right? This one - https://github.com/snakers4/silero-vad/blob/master/files/silero_vad.jit ?

The old model works?

This model should not contain torch.stft, unless we made a mistake during export

@adamnsandle

Can you check this please?

snakers4 avatar Dec 21 '21 06:12 snakers4

Hello, when I use the new models Running silero-vad on Android - https://github.com/bgubanov/VadExample , there are still errors about "RuntimeError: stft: ATen not compiled with MKL support". Is there a solution here?

Our current jit and onnx models do not contain torch.stft, although initial 3.0 release was with torch.stft, then we replaced it with mobile supported analogue

Please try latest models, it should work

adamnsandle avatar Dec 21 '21 08:12 adamnsandle

@uloveqian2021 Can you please check that with the latest model the problem goes away

snakers4 avatar Dec 21 '21 09:12 snakers4

@uloveqian2021 Can you please check that with the latest model the problem goes away

Yes, It's work now, Thanks!

wangbq18 avatar Dec 21 '21 14:12 wangbq18

Hello, when I use the new models Running silero-vad on Android - https://github.com/bgubanov/VadExample , there are still errors about "RuntimeError: stft: ATen not compiled with MKL support". Is there a solution here?

Our current jit and onnx models do not contain torch.stft, although initial 3.0 release was with torch.stft, then we replaced it with mobile supported analogue

Please try latest models, it should work

Yes, It's work now, Thanks!

wangbq18 avatar Dec 21 '21 14:12 wangbq18

这是来自QQ邮箱的假期自动回复邮件。   您好!我已收到您的邮件!

wangbq18 avatar Apr 27 '23 19:04 wangbq18