sbrt2017 icon indicating copy to clipboard operation
sbrt2017 copied to clipboard

Towards an end-to-end speech recognizer for Portuguese using deep neural networks

Towards an end-to-end speech recognizer for Portuguese using deep neural networks

This repository contains the implementation of the SBRT 2017 paper entitled Towards an end-to-end speech recognizer for Portuguese using deep neural networks.

Training a character-based all-neural Brazilian Portuguese speech recognition model

The model was trained using four datasets: CSLU Spoltech (LDC2006S16), Sid, VoxForge, and LapsBM1.4. Only the CSLU dataset is paid.

Setting up the (partial) Brazilian Portuguese Speech Dataset (BRSD)

You can download the freely available datasets with the provided script (it may take a while):

$ cd data; sh download_datasets.sh

Next, you can preprocess it into an hdf5 file. Click here for more information.

$ python -m extras.make_dataset --parser brsd

Training the network

You can train the network with the main.py script. For more usage information see this. To train with the default parameters:

$ python main.py train --dataset .datasets/brsd/data.h5

Pre-trained model

You may download a pre-trained sbrt2017 over the full brsd dataset (including the CSLU dataset):

$ cd data; sh download_model.sh

Also, you can evaluate the model against the brsd test set

$ python main.py eval --model data/models/sbrt2017.h5 --dataset .datasets/brsd/data.h5

Requirements

  • Python 2.7
  • Numpy
  • Scipy
  • Pyyaml
  • HDF5
  • Unidecode
  • Librosa
  • Tensorflow
  • Keras

Acknowledgements

  • python_speech_features for the audio preprocessing
  • Google Magenta for the hparams
  • @robertomest for helping me with everything
  • SANTOS, S. C. B.; ALCAIM, A. "Reduced Sets of Subword Units for Continuous Speech Recognition of Portuguese". Electronics Letters, v.36, p.586 588, 2000.

License

See LICENSE for more information