tortoise-tts icon indicating copy to clipboard operation
tortoise-tts copied to clipboard

Due to recent development, maybe release training components/scripts?

Open asusdisciple opened this issue 1 year ago • 18 comments

I know the author did not want to share the training procedure due to ethical reasons. But looking at the development (for example VITS) isnt that policy a bit outdated? There are already tools out there which perform on the same level or even better which can be trained. I think there is no point in blocking further development. What is your view on that?

asusdisciple avatar Sep 13 '23 12:09 asusdisciple

The training procedure is available, someone figured it out.

neonbjb avatar Sep 14 '23 01:09 neonbjb

Yeah I found a high level explanation, but to my knowledge it is still necessary to train all the models because the codebook apparently does not exist anymore. I do not fully understand all the complex subparts of the model, but apprently without this piece training can only be done by training end-to-end. Thats probably not an issue for people with expensive hardware, but for hobbyists like me its quite a limitation not to be able to do finetuning. Nevertheless thanks for all the effort in putting a project like this up! I really appreciate it that people still produce quality open source stuff in a time where A.I. is dominated by the industry more and more :)

asusdisciple avatar Sep 14 '23 15:09 asusdisciple

No I mean someone else (not me) put together a repo that allows you to fine tune the model.

neonbjb avatar Sep 14 '23 18:09 neonbjb

Ah I see I found the repo I think. I have access to a few a100 and will try to train and/or fine tune the tortoise model on my native language. If I succeed I will make it publically available!

asusdisciple avatar Sep 15 '23 12:09 asusdisciple

No I mean someone else (not me) put together a repo that allows you to fine tune the model.

Hey, can you send the link of that repo?

Shivamkumar285 avatar Oct 08 '23 06:10 Shivamkumar285

No I mean someone else (not me) put together a repo that allows you to fine tune the model.

Hey, can you send the link of that repo?

I'd like to see it too, thanks

NikitaKononov avatar Oct 11 '23 21:10 NikitaKononov

Hey, I'd like to see the repo, too. Thanks!

hairuo55 avatar Oct 13 '23 10:10 hairuo55

@asusdisciple you talked about wanting open sourced training code but went underground yourself?

RahulBhalley avatar Oct 13 '23 23:10 RahulBhalley

@RahulBhalley @hairuo55 @Shivamkumar285 I think I've found it, but not sure

mrq/ai-voice-cloning it's not on github, google - first link in search results

NikitaKononov avatar Oct 14 '23 19:10 NikitaKononov

Thanks @NikitaKononov. The wiki information looks very very promising! Hoping it to run fast on M1 chip as well. 🤞🏽

RahulBhalley avatar Oct 15 '23 05:10 RahulBhalley

You can't train this models on cpu or m1 , or low end gpus you'll need a bigger gpus , you can fine-tune it on lower end gpus with 15 -24 GB vram.

manmay-nakhashi avatar Oct 15 '23 06:10 manmay-nakhashi

I have access to an RTX 4090. Only problem is inference in production environments. This repo's Tortoise inference speed is sadly insanely slow!

RahulBhalley avatar Oct 15 '23 09:10 RahulBhalley

@NikitaKononov Thanks a lot for providing the information of mrq/ai-voice-cloning.

hairuo55 avatar Oct 15 '23 09:10 hairuo55

@neonbjb Hi, would you mind considering releasing the original scripts, since someone has already figured out the training scripts? All training scripts right now are licensed under GPL/AGPL, and I was wondering if the official ones could be released under Apache 2.0. Thanks!

fakerybakery avatar Oct 18 '23 23:10 fakerybakery

@fakerybakery I've managed to get fine-tuning for the autoregression model working, using some of the original author's code and DLAS library. I'm in the process of cleaning it up a bit more, but a working version is here: https://github.com/andrewsilva9/tune_tortoise_autoregressor (uses Apache 2.0 because I inherited code from the original author so copied forward the license).

andrewsilva9 avatar Nov 22 '23 03:11 andrewsilva9

Great! Thanks!

fakerybakery avatar Nov 22 '23 03:11 fakerybakery

Also @andrewsilva9 you mentioned in the readme that some code comes from the mrq repo, does this mean the agpl license applied?

fakerybakery avatar Nov 22 '23 03:11 fakerybakery

Let me double check my code to be sure, but I think everything is adapted/copied from DLAS or TorToiSe. I leaned heavily on the mrq repo to see how things connect and to track things through a debugger. I'll double check and update the README/scripts tomorrow!

andrewsilva9 avatar Nov 22 '23 04:11 andrewsilva9