DiffSinger icon indicating copy to clipboard operation
DiffSinger copied to clipboard

custom phone_set file

Open michaellin99999 opened this issue 2 years ago • 1 comments

Hi, with data preview we have create 72 phonemes, is there a way to train the model such that it doesn't use the existing phone_set file with 62 phonemes and can use up to 72 phonemes?

Thanks

michaellin99999 avatar Sep 28 '22 09:09 michaellin99999

get this error when usiing the svs inference.

size mismatch for fs2.encoder_embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]).

size mismatch for fs2.encoder.embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]).

what is the current mode referred to in this error message? [72,256] is our trained model. im tracing the code and cannot find what model [64,256] is refering to. when I train the model with my own dataset, do I need to train something else?

michaellin99999 avatar Oct 03 '22 10:10 michaellin99999