Listen-Attend-Spell-v2
Listen-Attend-Spell-v2 copied to clipboard
PyTorch implementation of Listen Attend and Spell Automatic Speech Recognition (ASR).
Listen Attend and Spell
PyTorch implementation of Listen Attend and Spell Automatic Speech Recognition (ASR). paper.
@article{chan2015las,
title={Listen, Attend and Spell},
author={William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals},
journal={arXiv:1508.01211},
year={2015}
}
DataSet
Introduction
Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd.
400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.
@inproceedings{aishell_2017,
title={AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline},
author={Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng},
booktitle={Oriental COCOSDA 2017},
pages={Submitted},
year={2017}
}
Obtain
Create a data folder then run:
$ wget http://www.openslr.org/resources/33/data_aishell.tgz
Dependencies
- Python 3.6
- PyTorch 1.0.0
Usage
Data wrangling
Extract data_aishell.tgz:
$ python extract.py
Extract wav files into train/dev/test folders:
$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' \;
Scan transcript data, generate features:
$ python pre_process.py
Now the folder structure under data folder is sth. like:
data/
data_aishell.tgz
data_aishell/
transcript/
aishell_transcript_v0.8.txt
wav/
train/
dev/
test/
aishell.pickle
Train
$ python train.py
To visualize the training process:
$ tensorboard --logdir=runs
Demo
$ python demo.py
Results
Reference
[1] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in ICASSP 2016. (https://arxiv.org/abs/1508.01211v2)