DiffSinger Inference SVS

Hello! Great job! I would like to know a few things. Interested in SVS (POPCS)

Can you tell me about inference? What files are used for inferencing? What's the recipe? How did you manage to repeat the melody (notes) if midi is not used?
Can I perform inference for English? What can I do about it? I understand that the accent will remain Chinese. Are you planning further work for other languages?

Jul 27 '22 18:07 ElizavetaSedova

1, I think you should read this file to get a better understanding: https://github.com/MoonInTheRiver/DiffSinger/blob/master/docs/README-SVS.md 2, The phoneme dictionary of EN is not the same as that of ZH. Thus the answer is no. You should re-train the model using International Phonetic Alphabet (IPA) or re-train the model on EN datasets.

Jul 28 '22 01:07 MoonInTheRiver

when inference use phoneme，there is an error "can't convert np.ndarray of type numpy.str_ ........."

Aug 02 '22 08:08 11721206

we have done exactly that and get this error when running the SVS inference:

Traceback (most recent call last): File "inference/svs/ds_e2e.py", line 71, in DiffSingerE2EInfer.example_run(c) File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/inference/svs/base_svs_infer.py", line 240, in example_run infer_ins = cls(hparams) File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/inference/svs/base_svs_infer.py", line 35, in init self.model = self.build_model() File "inference/svs/ds_e2e.py", line 26, in build_model load_ckpt(model, hparams['work_dir'], 'model') File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/utils/init.py", line 202, in load_ckpt cur_model.load_state_dict(state_dict, strict=strict) File "/home/wonder/anaconda3/envs/diffsinger_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for GaussianDiffusion: size mismatch for fs2.encoder_embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]). size mismatch for fs2.encoder.embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]).

[72, 256] is from retraining with our own EN dataset. We don't know where this current model is torch.Size([64, 256]) is coming from. any tips?

Oct 03 '22 10:10 michaellin99999

@michaellin99999 Hello, have you done that job yet?

Dec 14 '23 09:12 manhdoan291

DiffSinger DiffSinger copied to clipboard

Inference SVS

DiffSinger
DiffSinger copied to clipboard