DiffSinger icon indicating copy to clipboard operation
DiffSinger copied to clipboard

Inference SVS

Open ElizavetaSedova opened this issue 2 years ago • 4 comments

Hello! Great job! I would like to know a few things. Interested in SVS (POPCS)

  1. Can you tell me about inference? What files are used for inferencing? What's the recipe? How did you manage to repeat the melody (notes) if midi is not used?
  2. Can I perform inference for English? What can I do about it? I understand that the accent will remain Chinese. Are you planning further work for other languages?

ElizavetaSedova avatar Jul 27 '22 18:07 ElizavetaSedova

1, I think you should read this file to get a better understanding: https://github.com/MoonInTheRiver/DiffSinger/blob/master/docs/README-SVS.md 2, The phoneme dictionary of EN is not the same as that of ZH. Thus the answer is no. You should re-train the model using International Phonetic Alphabet (IPA) or re-train the model on EN datasets.

MoonInTheRiver avatar Jul 28 '22 01:07 MoonInTheRiver

when inference use phoneme,there is an error "can't convert np.ndarray of type numpy.str_ ........."

11721206 avatar Aug 02 '22 08:08 11721206

we have done exactly that and get this error when running the SVS inference:

Traceback (most recent call last): File "inference/svs/ds_e2e.py", line 71, in DiffSingerE2EInfer.example_run(c) File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/inference/svs/base_svs_infer.py", line 240, in example_run infer_ins = cls(hparams) File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/inference/svs/base_svs_infer.py", line 35, in init self.model = self.build_model() File "inference/svs/ds_e2e.py", line 26, in build_model load_ckpt(model, hparams['work_dir'], 'model') File "/media/wonder/6AF274CCF2749E4F/Wayne/DiffSinger/utils/init.py", line 202, in load_ckpt cur_model.load_state_dict(state_dict, strict=strict) File "/home/wonder/anaconda3/envs/diffsinger_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for GaussianDiffusion: size mismatch for fs2.encoder_embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]). size mismatch for fs2.encoder.embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]).

[72, 256] is from retraining with our own EN dataset. We don't know where this current model is torch.Size([64, 256]) is coming from. any tips?

michaellin99999 avatar Oct 03 '22 10:10 michaellin99999

@michaellin99999 Hello, have you done that job yet?

manhdoan291 avatar Dec 14 '23 09:12 manhdoan291