DiffSinger
DiffSinger copied to clipboard
custom phone_set file
Hi, with data preview we have create 72 phonemes, is there a way to train the model such that it doesn't use the existing phone_set file with 62 phonemes and can use up to 72 phonemes?
Thanks
get this error when usiing the svs inference.
size mismatch for fs2.encoder_embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]).
size mismatch for fs2.encoder.embed_tokens.weight: copying a param with shape torch.Size([72, 256]) from checkpoint, the shape in current model is torch.Size([64, 256]).
what is the current mode referred to in this error message? [72,256] is our trained model. im tracing the code and cannot find what model [64,256] is refering to. when I train the model with my own dataset, do I need to train something else?