wav2vec2_stt_python
wav2vec2_stt_python copied to clipboard
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
Wav2Vec2 STT Python
Beta Software
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.
Requirements:
- Python 3.7+
- Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
- Python package requirements:
cffi
,numpy
- Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.
Models:
Model | Download Size |
---|---|
Facebook Wav2Vec2 2.0 Base (960h) | 360 MB |
Facebook Wav2Vec2 2.0 Large (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 Self (960h) | 1.18 GB |
Usage
from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')
import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())
assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'
Also contains a simple CLI interface for recognizing wav
files:
$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...
positional arguments:
{decode} sub-command
decode decode one or more WAV files
optional arguments:
-h, --help show this help message and exit
Installation/Building
Recommended installation via wheel from pip (requires a recent version of pip):
python -m pip install wav2vec2_stt
See setup.py for more details on building it yourself.
Author
- David Zurow (@daanzu)
License
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.
Acknowledgments
- Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.