audio icon indicating copy to clipboard operation
audio copied to clipboard

Add training recipes for HuBERT model pre-training and ASR fine-tuning

Open nateanl opened this issue 4 years ago • 0 comments

🚀 The feature

Hidden-Unit BERT (HuBERT), a self-supervised model for speech representations was proposed and wildly used in down-stream tasks, such as speech recognition, speech diarization, speaker identification, etc. It can achieve impressive Word Error Rate by fine-tuning on only 10 minutes of supervised data.

To fine-tune the HuBERT model for customized down-stream task, people need to install and adopt their training pipeline to fairseq. It will be great to add a training recipe to torchaudio that loads the torchaudio's HuBERT model and simply the training process.

Motivation, pitch

  • [x] Add preprocessing scripts (MFCC feature extraction, KMeans model training, pseudo-label prediction).
  • [x] Add a PyTorch-Lightning trainer for HuBERT Base model pre-training using MFCC features.
  • [ ] Add a PyTorch-Lightning trainer for HuBERT Large model pre-training using HuBERT Base model representations.
  • [ ] Add a PyTorch-Lightning trainer for HuBERT Large model fine-tuning on LibriSpeech ASR task.

Alternatives

No response

Additional context

No response

nateanl avatar Oct 20 '21 18:10 nateanl