audio Add training recipes for HuBERT model pre-training and ASR fine-tuning

Add training recipes for HuBERT model pre-training and ASR fine-tuning

Open nateanl opened this issue 4 years ago • 0 comments

🚀 The feature

Hidden-Unit BERT (HuBERT), a self-supervised model for speech representations was proposed and wildly used in down-stream tasks, such as speech recognition, speech diarization, speaker identification, etc. It can achieve impressive Word Error Rate by fine-tuning on only 10 minutes of supervised data.

To fine-tune the HuBERT model for customized down-stream task, people need to install and adopt their training pipeline to fairseq. It will be great to add a training recipe to torchaudio that loads the torchaudio's HuBERT model and simply the training process.

Motivation, pitch

[x] Add preprocessing scripts (MFCC feature extraction, KMeans model training, pseudo-label prediction).
[x] Add a PyTorch-Lightning trainer for HuBERT Base model pre-training using MFCC features.
[ ] Add a PyTorch-Lightning trainer for HuBERT Large model pre-training using HuBERT Base model representations.
[ ] Add a PyTorch-Lightning trainer for HuBERT Large model fine-tuning on LibriSpeech ASR task.

Alternatives

No response

Additional context

No response

Oct 20 '21 18:10 nateanl

audio audio copied to clipboard

Add training recipes for HuBERT model pre-training and ASR fine-tuning

🚀 The feature

Motivation, pitch

Alternatives

Additional context

audio
audio copied to clipboard