Amphion Unable to run training script of Natural Speech 2

Hi,

I ran into multiple issues trying to run the training script: In ns2_dataset.py:

self.utt2phone[utt] = utt_info["phones"]: where phones comes from? I suspect we need to run the phonemizer first? but I don't see extract_phone=True in the config file
utt_info["num_frames"] is utt_info["Duration"], right?

In exp_config_base.json:

use_code=true, use_pitch=true, use_phone, should extract_acoustic_token=true, extract_pitch=true, extract_phone=true also?
There seems to be some mismatch between tts/preprocessing.py and the config file. For example: code_dir should be acoustic_token_dir?

Dec 19 '23 10:12 dongngm

It has some differences for the data processing for NS2 between other TTS. We will update the data processing section as soon as possible.

Dec 19 '23 11:12 HeCheng0625

Hi @HeCheng0625 ,

I hope this message finds you well. I understand that these things take time and effort, and I appreciate the work you're putting into it.

If possible, could you please provide an estimated timeline for when we might expect the update?

Dec 22 '23 07:12 vn09

Hi, we will update a new checkpoint and data processing pipeline on a large dataset (> 1 w hours) in about two weeks. Now, we only use libritts to train the model. Now, we use our pretrained model on libritts: https://huggingface.co/amphion/naturalspeech2_libritts Or, try the toy demo: https://huggingface.co/spaces/amphion/NaturalSpeech2

Dec 22 '23 08:12 HeCheng0625

Thanks @HeCheng0625.

Dec 22 '23 10:12 vn09

Hi @HeCheng0625 , I just wanted to hear from you if there have been any updates on the data processing pipeline.

Jan 06 '24 05:01 vn09

Any updates on the data preprocessing pipeline?