drlor2k comments

Results 9 comments of


                                            drlor2k

Question about: punctuation in script and voice mix data.

Thanks for your response @v-nhandt21, I tried https://github.com/thinhlpg/vixtts-demo, it's a great attempt but it lacks the necessary stability. I actually forgot that I could fine-tune it :v

Question about: punctuation in script and voice mix data.

hello @v-nhandt21, I have some takeaways from VITS2 and XTTS, can you give your opinion? 1. In terms of sound output quality, VITS may be better than XTTS. 2. XTTS...

Question about: punctuation in script and voice mix data.

Thank you for your response @v-nhandt21, I have another question, can you help me? I see that some speech2speech repo uses a very small `val dataset` (2 records each voice),...

Add a new language but the result is a meaningless audio.

thanks @aluminumbox for the reply, one more question! My German dataset is a bit mixed with English, is this ok? Or should I use a pure German dataset.

Add a new language but the result is a meaningless audio.

> Try using something like this: > > ```python > text = f"" + data["sentence"].strip() > ``` Hi @MiXaiLL76 , do you mean when adding a new language my data...

Restart training from a checkpoint, with steps, etc

I think the author will update the code for this issue, however I have a temporary solution you can refer to. 1. When starting a new training session, you need...

Restart training from a checkpoint, with steps, etc

I thought the above code would make `step` go back to `current_step` when starting a new epoch. I modified it a bit. ``` # Save init checkpoints info_dict = deepcopy(configs['train_conf'])...

Restart training from a checkpoint, with steps, etc

Yes, you need to modify the `current_epoch` and `current_step` values in `cosyvoice.fromscratch.yaml` every time you resume training from a checkpoint. You can see `current_epoch` and `current_step` in the checkpoint filename....

Other languages

hello @rlenain, are you training only llm model or also flow model? and how much GPU resources you use for Spanish training.