MaxMax2016
MaxMax2016
4 cut audio, less than 30 seconds for whisper
训练文件train.txt里面多了空白行?
there has no vits, just a bigvgan, after upsample layers use speaker info to change x with weights and biases.
lora is low rank adapter, here is another adapter from micsoft [AdaSpeech: Adaptive Text to Speech for Custom Voice](https://arxiv.org/pdf/2103.00993.pdf) or [Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers](https://arxiv.org/abs/2211.00585)
lora_svc is not real lora, use this name is just want svc developers to think about lora.
音色A * x +音色B * (1-x) ->新的音色, 0
验证集是空的,从训练集剪切前五条放到验证集
yes, you can use 48k to get pitch, but parameters should be modified.
more than ten minutes to fine tune pretrain model.