DiffSinger icon indicating copy to clipboard operation
DiffSinger copied to clipboard

Hello, I have issue as I try to use another english dataset. And I'm wondering why Inference from packed test set can work (`CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/midi/e2e/opencpop/ds100_adj_rel.yaml --exp_name $MY_DS_EXP_NAME --reset --infer`) but inference model from raw input (`python inference/svs/ds_e2e.py --config usr/configs/midi/e2e/opencpop/ds100_adj_rel.yaml --exp_name $MY_DS_EXP_NAME`) needs same phoneme set size?

Open michaellin99999 opened this issue 3 years ago • 14 comments

    Hello, I have issue as I try to use another english dataset. And I'm wondering why Inference from packed test set can work (`CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/midi/e2e/opencpop/ds100_adj_rel.yaml --exp_name $MY_DS_EXP_NAME --reset --infer`) but inference model from raw input (`python inference/svs/ds_e2e.py --config usr/configs/midi/e2e/opencpop/ds100_adj_rel.yaml --exp_name $MY_DS_EXP_NAME`) needs same phoneme set size?

Originally posted by @Wayne-wonderai in https://github.com/MoonInTheRiver/DiffSinger/issues/29#issuecomment-1260673475

michaellin99999 avatar Oct 03 '22 10:10 michaellin99999

same issue

michaellin99999 avatar Oct 03 '22 10:10 michaellin99999

When using our configs on your dataset, Please do check the "binary_data_dir" in hparams to make sure it points to your binarized data directory because the phoneme dictionary text file will decide the dimension of phone_encoder in the model.

MrZixi avatar Oct 09 '22 08:10 MrZixi

so, by pointing to our own binarized data in "binary_data_dir" this should change the dimension of phone_encoder to fit our model?

michaellin99999 avatar Oct 09 '22 09:10 michaellin99999

we get this issue Screenshot from 2022-10-04 17-40-29

michaellin99999 avatar Oct 09 '22 09:10 michaellin99999

When using our configs on your dataset, Please do check the "binary_data_dir" in hparams to make sure it points to your binarized data directory because the phoneme dictionary text file will decide the dimension of phone_encoder in the model.

I get this issue, Screenshot from 2022-10-04 17-40-29

michaellin99999 avatar Oct 09 '22 09:10 michaellin99999

Sorry I may have misunderstood your issue. If you want to infer from our pretrained ckpt, please make sure your phoneme dictionary is exactly the same as ours because some layers in the pretrained ckpt are related to this. Or the phoneme unit may be wrongly encoded due to different dictionaries.

MrZixi avatar Oct 09 '22 09:10 MrZixi

If you want to use customed phoneme dictionary, please follow our guidance and re-run the training.

MrZixi avatar Oct 09 '22 09:10 MrZixi

If you want to use customed phoneme dictionary, please follow our guidance and re-run the training.

we did that but ran into the issue above. We retrained FFT, and Diffsinger and whenwe try to put in sequence, the error above is shown. Can you point us to where the model is written so we can debug what is causing this issue? we cant pinpoint what is requiring the missing keys.

michaellin99999 avatar Oct 09 '22 09:10 michaellin99999

If you want to use customed phoneme dictionary, please follow our guidance and re-run the training.

我們是依照這個教學 (https://github.com/MoonInTheRiver/DiffSinger/blob/master/docs/README-SVS.md) 用英文資料集重新訓練, 但是當將FFT 跟Diffsinger 接起來時 會報上面這個錯誤 Screenshot from 2022-10-04 17-40-29 . 我們找不到是哪隻程式 會吃這些state_dict 的key. 您可以將我們指向是哪一行程式嗎. 另外, Diffsinger model 每個 layer 是寫在哪一個程式裡? 我們也找不到

michaellin99999 avatar Oct 09 '22 09:10 michaellin99999

If you want to use customed phoneme dictionary, please follow our guidance and re-run the training.

when we retrain (using different phoneme dimension) and don't care about the phoneme, the validation script can be used to create singing voice that resemble the new data. but the inference script doesnt work.

michaellin99999 avatar Oct 09 '22 09:10 michaellin99999

They are in the modules/***.

MrZixi avatar Oct 10 '22 02:10 MrZixi

They are in the modules/***. 125AEE75-AB9E-4197-93AC-F15FABDC3B50

Where does the run.py file get the list of modules to load?

C8860D5C-7BA6-40B6-9AC3-B5D573EAF527

michaellin99999 avatar Oct 10 '22 02:10 michaellin99999

They are in the modules/***.

Thank you last question, Which code is responsible for checking the model size and parameters? that gives the errror in loading state_dict for fastspeech2MIDI “ “missing keys in state_dict” and do we ignore that if training own model? FA808808-FD74-4671-BA64-1E20F61FF1EF

michaellin99999 avatar Oct 10 '22 02:10 michaellin99999

They are in the modules/***.

Thank you last question, Which code is responsible for checking the model size and parameters? that gives the errror in loading state_dict for fastspeech2MIDI “ “missing keys in state_dict” and do we ignore that if training own model? FA808808-FD74-4671-BA64-1E20F61FF1EF

您好,请问下呗,您使用diffsinger在英文数据集成功训练模型了嘛,感谢🙏

li-henan avatar Nov 10 '23 10:11 li-henan