sbrt2017
sbrt2017 copied to clipboard
Towards an end-to-end speech recognizer for Portuguese using deep neural networks
Towards an end-to-end speech recognizer for Portuguese using deep neural networks
This repository contains the implementation of the SBRT 2017 paper entitled Towards an end-to-end speech recognizer for Portuguese using deep neural networks.
Training a character-based all-neural Brazilian Portuguese speech recognition model
The model was trained using four datasets: CSLU Spoltech (LDC2006S16), Sid, VoxForge, and LapsBM1.4. Only the CSLU dataset is paid.
Setting up the (partial) Brazilian Portuguese Speech Dataset (BRSD)
You can download the freely available datasets with the provided script (it may take a while):
$ cd data; sh download_datasets.sh
Next, you can preprocess it into an hdf5 file. Click here for more information.
$ python -m extras.make_dataset --parser brsd
Training the network
You can train the network with the main.py
script. For more usage information see this. To train with the default parameters:
$ python main.py train --dataset .datasets/brsd/data.h5
Pre-trained model
You may download a pre-trained sbrt2017 over the full brsd dataset (including the CSLU dataset):
$ cd data; sh download_model.sh
Also, you can evaluate the model against the brsd test set
$ python main.py eval --model data/models/sbrt2017.h5 --dataset .datasets/brsd/data.h5
Requirements
- Python 2.7
- Numpy
- Scipy
- Pyyaml
- HDF5
- Unidecode
- Librosa
- Tensorflow
- Keras
Acknowledgements
- python_speech_features for the audio preprocessing
- Google Magenta for the hparams
- @robertomest for helping me with everything
- SANTOS, S. C. B.; ALCAIM, A. "Reduced Sets of Subword Units for Continuous Speech Recognition of Portuguese". Electronics Letters, v.36, p.586 588, 2000.
License
See LICENSE for more information