trafficstars

Listen Attend and Spell

apm

PyTorch implementation of Listen Attend and Spell Automatic Speech Recognition (ASR). paper.

@article{chan2015las,
title={Listen, Attend and Spell},
author={William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals},
journal={arXiv:1508.01211},
year={2015}
}

DataSet

Introduction

Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd.

400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.

@inproceedings{aishell_2017,
  title={AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline},
  author={Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng},
  booktitle={Oriental COCOSDA 2017},
  pages={Submitted},
  year={2017}
}

Obtain

Create a data folder then run:

$ wget http://www.openslr.org/resources/33/data_aishell.tgz

Dependencies

Python 3.6
PyTorch 1.0.0

Usage

Data wrangling

Extract data_aishell.tgz:

$ python extract.py

Extract wav files into train/dev/test folders:

$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' \;

Scan transcript data, generate features:

$ python pre_process.py

Now the folder structure under data folder is sth. like:

data/
    data_aishell.tgz
    data_aishell/
        transcript/
            aishell_transcript_v0.8.txt
        wav/
            train/
            dev/
            test/
    aishell.pickle

Train

$ python train.py

To visualize the training process：

$ tensorboard --logdir=runs

Demo

$ python demo.py

Results

Reference

[1] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in ICASSP 2016. (https://arxiv.org/abs/1508.01211v2)

Listen-Attend-Spell-v2
Listen-Attend-Spell-v2 copied to clipboard

Metadata

Listen Attend and Spell

DataSet

Introduction

Obtain

Dependencies

Usage

Data wrangling

Train

Demo

Results

Reference

← Metadata

Owner

Metadata

Listen-Attend-Spell-v2 Listen-Attend-Spell-v2 copied to clipboard

Metadata

Listen Attend and Spell

DataSet

Introduction

Obtain

Dependencies

Usage

Data wrangling

Train

Demo

Results

Reference

← Metadata

Owner

Metadata

Listen-Attend-Spell-v2
Listen-Attend-Spell-v2 copied to clipboard