contentvec
contentvec copied to clipboard
speech self-supervised representations
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
This repository provides the official PyTorch implementation of ContentVec.
This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper.
Cite this paper
https://proceedings.mlr.press/v162/qian22b.html
Pre-trained models
The legacy model only contains the representation module, which may be loaded using plain fairseq installation without setting up this code repo.
Model | Classes | |
---|---|---|
ContentVec_legacy | 100 | download |
ContentVec | 100 | download |
ContentVec_legacy | 500 | download |
ContentVec | 500 | download |
Load a model
ckpt_path = "/path/to/the/checkpoint_best_legacy.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]
Train a new model
Data preparation
Download the zip file consisting of the following files:
-
{train,valid}.tsv
waveform list files in metadata -
{train,valid}.km
frame-aligned pseudo label files in labels -
dict.km.txt
a dummy dictionary in labels -
spk2info.dict
a dictionary mapping from speaker id to speaker embedding in metadata
Modify the root directory in the {train,valid}.tsv
waveform list files
Setup code repo
Follow steps in setup.sh
to setup the code repo
Pretrain ContentVec
Use run_pretrain_single.sh
to run on a single node
Use run_pretrain_multi.sh
and the corresponding slurm template to run on multiple GPUs and nodes