DiffSinger_colab_notebook_MLo7
DiffSinger_colab_notebook_MLo7 copied to clipboard
DiffSinger training colab notebook to make training easier hopefully
DiffSinger training notebook:
current supported data format:
- lab + wav (NNSVS format)
- csv + wav (DiffSinger format)
- ds (DiffSinger .ds file) [not fully tested in colab]
NOTE:
- your_speaker_folder's folder name will be used as spk_name so please be careful about your file naming
- colab notebook primarily uses python; thus space in file name or folder path may be invalid
- for an in-depth guide for SVS training and/or labeling, please see SVS Singing Voice Database - Tutorial
- it is advised to edit your data using SlurCutter for a more refined data for your pitch model
- please visit DiffSinger Discord for any help and questions regarding model production
Zip file format examples:
#single speaker (lab + wav | ds + wav) your_zip.zip: | | your_speaker_folder: | | data_1.wav data_1.lab (or.ds) . data_2.wav data_2.lab (or.ds) . data_3.wav data_3.lab (or.ds) . ...
#single speaker (csv + wav) your_zip.zip: | | your_speaker_folder: | | wavs (folder named "wavs" containing all the wavs) . transcriptions.csv
#multi speaker (lab + wav | ds + wav) your_zip.zip: | | your_speaker_folder_1: | | data_1.wav data_1.lab (or.ds) . data_2.wav data_2.lab (or.ds) . data_3.wav data_3.lab (or.ds) . ... your_speaker_folder_2: | | data_1.wav data_1.lab (or.ds) . data_2.wav data_2.lab (or.ds) . data_3.wav data_3.lab (or.ds) . ...
#multi speaker (csv + wav) your_zip.zip: | | your_speaker_folder_1: | | wavs (folder named "wavs" containing all the wavs) . transcriptions.csv your_speaker_folder_2: | | wavs (folder named "wavs" containing all the wavs) . transcriptions.csv
Vocoder finetuning notebook:
current supported data format:
- wav
NOTE:
- it is suggested to use manual segmented audio for cleaner segments (though there's minimal difference when using the auto segmentation)
- zip file format can consist of any type of files, even subfolders. data extraction will only account .wav that are within the zip into the training set
SOFA training notebook (wip):
current supported data format:
- lab + wav (NNSVS format)
NOTE:
- this notebook is still a rough draft, please either don't use it at all or use it with caution....
MFA inference notebook:
current supported data format:
- wav
- txt + wav (wav with text transcription)
- lab + wav (wav with text transcription)
NOTE:
- this notebook is the alternative force alignment option, but SOFA should works better on singing data
- zip must have no subfolders
Plans (update might not be in order):
- [notebook] improve SOFA notebook, add inference
- [notebook] REMOVE MFA inference notebook (migrate to SOFA or something)
Credits:
-
openvpi for DiffSinger fork and more
-
UtaUtaUtau for nnsvs-db-converter
-
Kei for the original notebook
-
MLo7 for the repo's content
-
PixPrucer for an in-depth SVS guide
-
haru0l for the base pretrain with embeds
-
AgentAsteriski for the current local GUI