silero-models
silero-models copied to clipboard
Changelog
Mirroring changelog Some important changes, too small for a release
2020-10-03 Batched ONNX and TF Models
- Extensively clean up and simplify ONNX and TF model code
- Add batch support to TF and ONNX models
- Update examples
- (pending) Submit new models to TF Hub and update examples there
2020-10-28 Minor PyTorch 1.7 fix
- torch.hub.load signature was changed
2020-11-03 English Model V2 Released
- A minor release, i.e. other models not affected
- English model was made much more robust to certain dialects
- Performance metrics coming soon
PS - the model should generalize much better in general
2020-11-03 [Experimental] Ukrainian Model V1 Released
- An experimental model
- Trained from a small community contributed corpus
- New Full model size reduced to 85 MB
- New - quantized model is ony 25 MB
- No TF or ONNX models
- Will be re-released a fine-tuned model from a larger Russian corpus upon V3 release
2020-11-26 Fix TensorFlow Examples
Nasty Google makes their tf.hub utils locked ...
2020-12-04 Add EE Distro Sizing and New Speed Metrics
- https://github.com/snakers4/silero-models/wiki/Performance-Benchmarks
Moved some issues with useful answers to discussions and marked some answers as "solved"
Replaced CDN links with the ordinary links
Migrated to our own file hosting in preparation for new releases
Ukrainian Model V3 Released
- On a larger corpus (~1000 hours)
- Fine tuned from a commercial production Russian model
- Model flavors:
jit
(CPU or GPU),jit_q
(quantized, CPU only), andonnx
(ONNX) - Huge model speed improvements for CPU inference (roughly 2-3x) compared to the previous one, comparable with
new best
from here - Will be dropping TF support altogether
- No proper quality benchmarks for an experimental model though
Added current state into changelog Added more updates regarding the new ua model
https://github.com/snakers4/silero-models/commit/2fb61a1c5a420c9fc73f7c1e0b2c92cc72bb83ca
TTS models pre-release Some doc improvements Working on V3 model release
2021-04-20 Add v3
STT English Models
Huge update for English!
- Default model (
jit
oronnx
) size is reduced almost by 50% without sacrificing quality (!); - New model flavours:
jit_q
(smaller quantized model),jit_skip
(with exposed skip connections),jit_large
(higher quality model),onnx_large
(!); - New smallest model
jit_q
is only 40M in size (!); - Tensorflow checkpoints discontinued;
- New performance benchmarks - default models are on par with previous models and Google, large models mostly outperform Google (!);
- Even more quality improvements coming soon (!);
- CE benchmarks coming soon;
-
xsmall
model was created (2x smaller than the default), but I could not quantize it. I am looking into creating axxsmall
model; - Still working on making EE models fully JIT-traceable;
2021-04-21 Add v3
xsmall STT English Models
- Polish docs;
- Add
xsmall
andxsmall_q
model flavours foren_v3
; - Polish performance benchmarks page a bit;
Added minimal standalone TTS example
Added v4_0 large English model, metrics coming soon
Added v4_0 large English model metrics
2021-06-18 Large V2 TTS release, v4_0 Large English STT Model
- Added v4_0 large English model with metrics;
- V2 TTS models with x4 faster vocoder;
- Russian models now feature automatic stress and
ё
, homonyms are not handled yet; - A multi-language multi-speaker model;
Will also repost here our EE solution changelogs from now on
Silero Models EE, First Numbered Version v1.1 🚀 (Mar 23, 2021)
Bug Fixes 🐛
-
Extractor / post-processing bugs:
-
двадцатое ноль шестое
https://github.com/snakers4/silero-models-ee/issues/2#issuecomment-797553296 -
третьего марта -> 3 марта
https://github.com/snakers4/silero-models-ee/issues/2#issuecomment-779681707 -
тысяча второй
https://github.com/snakers4/silero-models-ee/issues/2#issuecomment-788051341
-
-
Packaging:
- Some issues with library versioning fixed
New Fields ➕
- New field
transcript_denorm
- transcribed text without normalization / post-processing;
Distributions 💽
- First numbered release version
1.1
- New distros entirely focused on better working with numbers https://github.com/snakers4/silero-models-ee/commit/1e2857f6af8f2c7afe101db58052713eba30f186
- Migrated to a private docker hub https://github.com/snakers4/silero-models-ee/commit/06f654cfae25aab8fed58a1d5cfdcf20d414fad1
- Distribution compatibility across versions https://github.com/snakers4/silero-models-ee/commit/78395ed01f30065e510bc8e90d7d26d020efc7df
Sizing ⚡
-
xxsmall
sizing added -
CPUSET_LM
environment variable to isolate AM and LM workers, AM and LM workers "thrashing" fixed - Previous sizing replication attempt shows that now Intel is better than AMD for standard MKL-based PyTorch builds (that we base our distros on)
- Updated sizings on several machines
- RAM requirements heavily optimized due to more efficient startup https://github.com/snakers4/silero-models-ee/commit/d0463344c4337af9a6acfcc64e0ed5de9d6d440b
New Environment Variables 🎛️
- Mandatory
-
VERSION=1.1
- distro version https://github.com/snakers4/silero-models-ee/commit/0d3f16fe68e9589ef89dcb431ecf32cca8501c90 -
CPUSET_LM=10-15
environment variable to isolate AM and LM workers https://github.com/snakers4/silero-models-ee/commit/d3f418b53376d2b41c2690331cf7b2ea84f3d462
-
- Optional
-
MAX_DIAR_TIME_PER_CHUNK=1
timeout for diarization https://github.com/snakers4/silero-models-ee/commit/0d3f16fe68e9589ef89dcb431ecf32cca8501c90 -
MAX_AUDIO_LENGTH=1024
max audio length https://github.com/snakers4/silero-models-ee/commit/0d3f16fe68e9589ef89dcb431ecf32cca8501c90
-
Silero Models EE, v1.2 STT Quality Improvements, TTS Release, gRPC, Packaging Improvements
Bug Fixes 🐛
- Minor post-processing bugs fixed;
- Collected edge cases were used for quality control;
- Performance degradation related to batches with audios of very different lengths partially fixed (50-70%);
STT Model Improvements and Simplifications 🚀
- Model naming simplification;
- Several internal releases and internal model simplifications;
- New higher quality STT models -
ru_xlarge_v012.model
for GPU only andru_small_v012_q.model
for CPU only; - The CPU model is quantized (the non-quantized version is not provided to avoid confusion). The quality gap between quantized and original for this new model is negligible;
- Quality of new
xlarge
model in line with thebleeding edge
model this article; - Major library version facelift, PyTorch images based off v1.9 now;
- Model freeze during initial model warmup and loading, small additional speed boost;
- LM startup fixed for large number of LM workers. Now the LM file is locked and LMs are launched consequtively instead of a random delay;
Deprecations 🚫
-
xsmall
andlarge
models deprecated for simplicity; - Legacy post-processing pipeline deprecated in favor of the new one entirely, now there will be only one
decoder.py
; - Because of LM file locking, using old license files with new images may result in slower LM loading for large installations;
- See this change. To avoid confusion in future, it is advised to use
pytransform.so => pytransform.so
mounts in future (also please make sure to consult with compatibility table);
STT Model Metrics 💎
All of these metrics are calculated following this article on 1 hour subsets (hence metrics can be a bit different from the historical ones):
АПИ (ru_xlarge_v1_postv2) | Bleeding Edge | xlarge_v012 | small_v012_q | |
---|---|---|---|---|
Чтение | 7 | 6 | 5.8 | 8.7 |
Справочная | 16 | 11 | 10.9 | 14.6 |
Такси | 13 | 12 | 11.6 | 16.7 |
Публичные выступления | 14 | 12 | 12.3 | 17.4 |
Радио | 18 | 15 | 15.7 | 21.3 |
Суд | 20 | 20 | 17.7 | 22.9 |
Аудио книги | 24 | 20 | 20 | 25.2 |
Справочная | 25 | 20 | 21 | 26.7 |
Аэропорт | 21 | 22 | 21.5 | 27.1 |
Финансы (оператор) | 25 | 24 | 21.8 | 27.5 |
YouTube | 28 | 25 | 23.6 | 30.6 |
Умная колонка | 30 | 27 | 25.3 | 31.9 |
Умная колонка (далеко) | 41 | 27 | 27.2 | 35.3 |
E-commerce | 29 | 29 | 28 | 35.5 |
Yellow pages | 32 | 29 | 30 | 35.9 |
Диспетческая | 41 | 32 | 32.2 | 39.2 |
Медицинские термины | 35 | 33 | 32.7 | 39.7 |
Банк | 39 | 35 | 36.3 | 40.9 |
Пранки | 41 | 35 | 36.4 | 43.8 |
Стихи, рэп | 43 | 41 | 46.2 | 53.1 |
Average | 27.1 | 23.75 | 23.81 | 29.7 |
TTS Release 🎙️
- TTS release following these articles - 1, 2;
- Commercial speaker models available:
tts_aidar_v012.pt
,tts_baya_v012.pt
,tts_kseniya_v012.pt
; - Automated stress and
ё
for ~97% of all cases in Russian language;
New Features ➕
- TTS release;
- New STT models;
- New experimental gRPC interface - still requires some testing and polish (WIP - at this moment one VAD param probably should be tuned per-installation and hence added to the Environment);
Distributions and Packaging 💽
- Docker image security improvements;
- Several sizes of VAD provided for gRPC;
- New images
v1.2
, migration to PyTorch 1.9, library version updates and compatibility testing;
Sizing ⚡
- New sizing for TTS models;
- New sizing for gRPC interface;
- Updated sizing for STT models;
New Environment Variables 🎛️
Please see the respective docs for more detailed information:
2021-08-09 German V3 Large Model
- German V3 Large
jit
model trained on more data - large quality improvement; - Metrics coming soon;
2021-09-03 German V4 and English V5 Models
- German V4 large
jit
andonnx
models; - English V5
small
(jit
andonnx
),small_q
(onlyjit
) andxlarge
(jit
andonnx
) models; - Vast quality improvements (metrics to be added shortly) on the majority of domains;
- English
xsmall
models coming soon (jit
Better progress visualization for English EE models
Quick update - added English V5 quantized ONNX model
- https://github.com/snakers4/silero-models/blob/master/models.yml#L11
2021-10-06 Text Recapitalization and Repunctuation Model for 4 Languages
- Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
- Works for 4 languages (Russian, English, German, Spanish) and can be extended;
- By design is domain agnostic and is not based on any hard-coded rules;
- Has non-trivial metrics and succeeds in the task of improving text readability;
Quick update - updated list of articles - https://github.com/snakers4/silero-models#further-reading
2021-12-09 Improved Text Recapitalization and Repunctuation Model for 4 Languages
- The model now can work with long inputs, 512 tokens or ca. 150 words;
- Inputs longer than 150 words are automatically processed in chunks;
- The bugs with newer PyTorch versions have been fixed;
- Model was trained longer with larger batches;
- Model size slightly reduced to 85 MB;
- The rest of model optimizations were deemed too high maintenance;
2022-02-24 English V6 Release
- New
en_v6
models; - Quality improvements for English models;
2022-02-28 Experimental Pip Package
- Models are downloaded on demand both by
pip
and PyTorch Hub; - If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder);
- Please see these docs for more information;
- PyTorch Hub and pip package are based on the same code. Hence all examples, historically based on torch.hub.load can be used with a pip-package;