silero-vad icon indicating copy to clipboard operation
silero-vad copied to clipboard

Changelog - V5 just released!

Open snakers4 opened this issue 4 years ago • 32 comments

Just a handy issue to be notified of latest changes and micro-releases (we will mostly changing the models)

snakers4 avatar Dec 15 '20 14:12 snakers4

Initial models, examples, utils for VAD only uploaded (no number detector or language classifier yet)

snakers4 avatar Dec 15 '20 14:12 snakers4

First readable public release

snakers4 avatar Dec 15 '20 16:12 snakers4

Added VAD latency and throughput metrics

snakers4 avatar Dec 17 '20 10:12 snakers4

Updated VAD quality Before / after (precision / recall) image

snakers4 avatar Dec 22 '20 18:12 snakers4

Added < 250ms compatibility image

adamnsandle avatar Dec 24 '20 10:12 adamnsandle

Added number detector

Sontref avatar Dec 31 '20 00:12 Sontref

Language detector example, readme update + FAQ

snakers4 avatar Jan 11 '21 12:01 snakers4

Audiotok benchmarks added Looks like all energy based solutions are kind of similar

snakers4 avatar Jan 20 '21 13:01 snakers4

Added a utility to tune the VAD params properly for a domain

snakers4 avatar Feb 01 '21 05:02 snakers4

Some final benchmarks posted here - https://github.com/pyannote/pyannote-audio/issues/604#issue-798003383 Probably we are done with benchmarks for now

snakers4 avatar Feb 03 '21 05:02 snakers4

Added micro (10k params, 100x smaller) VAD models

snakers4 avatar Feb 11 '21 16:02 snakers4

Added micro (10k params, 100x smaller) VAD models for 8 kHz audio

snakers4 avatar Mar 22 '21 13:03 snakers4

  • Added mini (100k params) VAD models for 8 kHz and 16 kHz
  • Added adaptive vad iterator

https://github.com/snakers4/silero-vad/pull/54

snakers4 avatar Apr 12 '21 15:04 snakers4

  • Fixed examples and notebooks
  • Updated README
  • Added adaptive examples

snakers4 avatar Apr 16 '21 01:04 snakers4

  • Added a language classifier for 116 languages
  • It classifies audios into languages and mutually intelligible language groups (i.e. Serbian + Bosnian + Croatian, Russian + Ukranian + others, Hindi + Urdu, etc), see the full list here and here
  • Probably some artifical / unspoken languages will be excluded and a large model will be trained

snakers4 avatar Jul 09 '21 12:07 snakers4

improved language classifier

  • 95 languages (85% accuracy), 58 language groups (90% accuracy)
  • Mutually intelligible languages are united into language groups (i.e. Serbian + Croatian + Bosnian are very similar)
  • Trained on approx 20k hours of data (10k of which are for 5 most popular languages)
  • 4.7M params

snakers4 avatar Jul 21 '21 07:07 snakers4

updated further reading section

snakers4 avatar Oct 14 '21 07:10 snakers4

New V3 Silero VAD is Already Here

Main changes

  • One VAD to rule them all! New model includes the functionality of the previous ones with improved quality and speed!
  • Flexible sampling rate, 8000 Hz and 16000 Hz are supported;
  • Flexible chunk size, minimum chunk size is just 30 milliseconds!
  • 100k parameters;
  • GPU and batching are supported;
  • Radically simplified examples;

Migration

Please see the new examples.

New get_speech_timestamps is a simplified and unified version of the old deprecated get_speech_ts or get_speech_ts_adaptive methods.

speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)

New VADIterator class serves as an example for streaming tasks instead of old deprecated VADiterator and VADiteratorAdaptive.

vad_iterator = VADIterator(model)
window_size_samples = 1536

for i in range(0, len(wav), window_size_samples):
   speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
   if speech_dict:
       print(speech_dict, end=' ')
vad_iterator.reset_states()

snakers4 avatar Dec 07 '21 12:12 snakers4

Even Better V3 Silero VAD

  • Models with even higher quality (just see the plots with metrics!);
  • New model ~ large model >> all previous (even large) models;
  • Now model works properly quality-wise, i.e. 100ms > 60ms > 30ms and16 kHz > 8 kHz;

snakers4 avatar Dec 10 '21 11:12 snakers4

This summarises new progress well

image

snakers4 avatar Dec 10 '21 13:12 snakers4

New V3 ONNX VAD Released

We finally were able to port a model to ONNX:

  • Compact model (~100k params);
  • Both PyTorch and ONNX models are not quantized;
  • Same quality model as the latest best PyTorch release;
  • Only 16kHz available now (ONNX has some issues with if-statements and / or tracing vs scripting) with cryptic errors;
  • In our tests, on short audios (chunks) ONNX is 2-3x faster than PyTorch (this is mitigated with larger batches or long audios);
  • Audio examples and non-core models moved out of the repo to save space;

snakers4 avatar Dec 17 '21 15:12 snakers4

Support For Sampling Rates Higher Than 16 kHz

  • jit model now can handle 8, 16, 32 and 48 kHz directly (change implemented within the model itself);
  • onnx model as well, but only via external wrappers (we just use each n-th sample for higher sampling rates);
  • This support is mostly a hack, i.e. we just use each n-th sample for higher sampling rates (instead of averaging);

snakers4 avatar Dec 21 '21 11:12 snakers4

⚠️ Important Information for VAD Python Users ⚠️

If you are using the VAD in a:

  • multi-threaded or
  • a multi-process application

Do not forget to disable gradients in EACH process and / or thread. Otherwise memory may leak noticeably.

snakers4 avatar Feb 25 '22 15:02 snakers4

image

image

snakers4 avatar Feb 25 '22 15:02 snakers4

New V4 VAD Released

Changes:

  • Improved quality
  • Improved perfomance
  • Both 8k and 16k sampling rates are now supported by the ONNX model
  • Batching is now supported by the ONNX model
  • Added audio_forward method for one-line processing of a single or multiple audio without postprocessing

adamnsandle avatar Oct 26 '22 16:10 adamnsandle

It is worth posting this chart:

image

snakers4 avatar Oct 27 '22 02:10 snakers4

  • Remove picovoice mentions

snakers4 avatar Mar 29 '23 18:03 snakers4

  • Deprecate language classifier and number detector models, since they are not maintained anymore.

snakers4 avatar Apr 27 '23 10:04 snakers4