finetuning code?
Thanks for the release!
Can you give any information on the training code?
+1 cc @manmay-nakhashi
+1
+1
+1
+1
+1!
+1M
Do we have any update? @ShaanveerS
+1
+1
First off, just want to say a heartfelt thanks for all the love and excitement around this model. We’re genuinely grateful for the open source community, and honestly, a lot of what we do is inspired by your projects, questions, and experiments.
We’ve had quite a few people asking for the training code. I get it, openness is a big part of what makes this space awesome. I want to be real with you: building these models is not cheap. Training, testing, and keeping everything running takes a lot of resources, both time and money. In order to keep this going and actually be able to release more cool stuff in the future, we need a way to keep the project sustainable.
For now, that means we’re not releasing the training code, and fine-tuning will be something we support through our paid API (https://app.resemble.ai). This helps us pay the bills and keep pushing out models that (hopefully) benefit everyone.
We love being part of this community, and we’re doing our best to strike a balance between openness and being able to keep doing the work. We hope you understand. If you have thoughts, feedback, or just want to chat, our door is always open.
+1 It would be good to have ability to finetune for a different language support.
Finetuning this model is easy. You just need to finetune T3 block (maybe CFM but I don't think it's necessary). Since there is already a audio tokenizer shared it's just a LLM finetuning in the end.
@talipturkmen Do you have a script that you could share?
I am currently working on finetuning the t3 block for german. Looks promising so far. Early tests indicated that the s3 block might need finetuning aswell to get rid of accent. I will os the script as soon as I am sure it's doing what it's supposed to do.
https://github.com/alisson-anjos/chatterbox-finetune
Did you test it?
https://github.com/alisson-anjos/chatterbox-finetune
https://github.com/alisson-anjos/chatterbox-finetune
This is still in progress, the most important finetune script is still missing, which is t3. @stlohrey is working on it and the script he is making is more advanced than mine.
So, the finetuning script is working, you can find it here: https://github.com/stlohrey/chatterbox-finetuning Keep in mind that it is WIP and not very clean. I made a german model with it (also WIP), this you can find at https://huggingface.co/stlohrey/chatterbox_de.
Dear @stlohrey, Could you please share some insights on how to prepare data and structure it appropriately? Additionally, we'd appreciate it if you could explain how to prepare a tokenizer as well.
@stlohrey thank you for doing this and making it public! in your experiments with German is it enough just to finetune t3 part or one needs to finetune also s3gen?
What concerns do I have with Arabic UTF8-based language? and its data preparation? @stlohrey
@stlohrey ,
I am working with Indian languages. The fine-tuned model is generating audio, but the speech does not correspond to the input text. How did you handle tokenization for German?
Check the tokenizer in the huggingface repo - There are already German letters present, so there is no need to change anything since the model was already trained with this config.
I tried to train on some other EU languages that need tokenizer partial change - few of the letters are missing - it's best to replace some letters present in the vocab that you will not use with the missing letters, so the model keeps the knowledge from the previous training for common alphabet...
I have no experience with Indian languages - if they use completely different alphabet different than latin one, you will need to modify the tokenizer completely - check the recent video about japanese training released recently - https://www.youtube.com/watch?v=G0BoZfO8a5c
First off, just want to say a heartfelt thanks for all the love and excitement around this model. We’re genuinely grateful for the open source community, and honestly, a lot of what we do is inspired by your projects, questions, and experiments.
We’ve had quite a few people asking for the training code. I get it, openness is a big part of what makes this space awesome. I want to be real with you: building these models is not cheap. Training, testing, and keeping everything running takes a lot of resources, both time and money. In order to keep this going and actually be able to release more cool stuff in the future, we need a way to keep the project sustainable.
For now, that means we’re not releasing the training code, and fine-tuning will be something we support through our paid API (https://app.resemble.ai). This helps us pay the bills and keep pushing out models that (hopefully) benefit everyone.
We love being part of this community, and we’re doing our best to strike a balance between openness and being able to keep doing the work. We hope you understand. If you have thoughts, feedback, or just want to chat, our door is always open.
I liked the on-the-box output of the model. Although I hate the fact that I had to pay for an API in order to train a different voice or to another language, I can't argue and think of a better method that both support the development and the community. I hope you guys come up with a better method, I really want to try and train a different voice.
Check the tokenizer in the huggingface repo - There are already German letters present, so there is no need to change anything since the model was already trained with this config.
I tried to train on some other EU languages that need tokenizer partial change - few of the letters are missing - it's best to replace some letters present in the vocab that you will not use with the missing letters, so the model keeps the knowledge from the previous training for common alphabet...
I have no experience with Indian languages - if they use completely different alphabet different than latin one, you will need to modify the tokenizer completely - check the recent video about japanese training released recently - https://www.youtube.com/watch?v=G0BoZfO8a5c
Thank you @C00reNUT .. got a good starting point here.