MaxMax2016

Results 243 comments of MaxMax2016

4 cut audio, less than 30 seconds for whisper

训练文件train.txt里面多了空白行?

there has no vits, just a bigvgan, after upsample layers use speaker info to change x with weights and biases.

lora is low rank adapter, here is another adapter from micsoft [AdaSpeech: Adaptive Text to Speech for Custom Voice](https://arxiv.org/pdf/2103.00993.pdf) or [Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers](https://arxiv.org/abs/2211.00585)

lora_svc is not real lora, use this name is just want svc developers to think about lora.

音色A * x +音色B * (1-x) ->新的音色, 0

yes, you can use 48k to get pitch, but parameters should be modified.

more than ten minutes to fine tune pretrain model.