chatterbox icon indicating copy to clipboard operation
chatterbox copied to clipboard

finetuning code?

Open ShaanveerS opened this issue 6 months ago • 26 comments

Thanks for the release!

Can you give any information on the training code?

ShaanveerS avatar May 28 '25 19:05 ShaanveerS

+1 cc @manmay-nakhashi

fakerybakery avatar May 29 '25 01:05 fakerybakery

+1

JohnHerry avatar May 29 '25 01:05 JohnHerry

+1

Shine1i avatar May 29 '25 04:05 Shine1i

+1

tarun7r avatar May 29 '25 10:05 tarun7r

+1

freds0 avatar May 29 '25 11:05 freds0

+1!

NeuroDonu avatar May 29 '25 17:05 NeuroDonu

+1M

thewh1teagle avatar May 29 '25 21:05 thewh1teagle

Do we have any update? @ShaanveerS

cod3r0k avatar May 29 '25 23:05 cod3r0k

+1

bmwas avatar May 29 '25 23:05 bmwas

+1

ichDaheim avatar May 30 '25 12:05 ichDaheim

First off, just want to say a heartfelt thanks for all the love and excitement around this model. We’re genuinely grateful for the open source community, and honestly, a lot of what we do is inspired by your projects, questions, and experiments.

We’ve had quite a few people asking for the training code. I get it, openness is a big part of what makes this space awesome. I want to be real with you: building these models is not cheap. Training, testing, and keeping everything running takes a lot of resources, both time and money. In order to keep this going and actually be able to release more cool stuff in the future, we need a way to keep the project sustainable.

For now, that means we’re not releasing the training code, and fine-tuning will be something we support through our paid API (https://app.resemble.ai). This helps us pay the bills and keep pushing out models that (hopefully) benefit everyone.

We love being part of this community, and we’re doing our best to strike a balance between openness and being able to keep doing the work. We hope you understand. If you have thoughts, feedback, or just want to chat, our door is always open.

TediPapajorgji avatar May 30 '25 18:05 TediPapajorgji

+1 It would be good to have ability to finetune for a different language support.

chigkim avatar Jun 01 '25 12:06 chigkim

Finetuning this model is easy. You just need to finetune T3 block (maybe CFM but I don't think it's necessary). Since there is already a audio tokenizer shared it's just a LLM finetuning in the end.

talipturkmen avatar Jun 02 '25 21:06 talipturkmen

@talipturkmen Do you have a script that you could share?

chigkim avatar Jun 03 '25 14:06 chigkim

I am currently working on finetuning the t3 block for german. Looks promising so far. Early tests indicated that the s3 block might need finetuning aswell to get rid of accent. I will os the script as soon as I am sure it's doing what it's supposed to do.

stlohrey avatar Jun 04 '25 17:06 stlohrey

https://github.com/alisson-anjos/chatterbox-finetune

C00reNUT avatar Jun 04 '25 20:06 C00reNUT

Did you test it?

https://github.com/alisson-anjos/chatterbox-finetune

cod3r0k avatar Jun 05 '25 01:06 cod3r0k

https://github.com/alisson-anjos/chatterbox-finetune

This is still in progress, the most important finetune script is still missing, which is t3. @stlohrey is working on it and the script he is making is more advanced than mine.

alisson-anjos avatar Jun 05 '25 12:06 alisson-anjos

So, the finetuning script is working, you can find it here: https://github.com/stlohrey/chatterbox-finetuning Keep in mind that it is WIP and not very clean. I made a german model with it (also WIP), this you can find at https://huggingface.co/stlohrey/chatterbox_de.

stlohrey avatar Jun 06 '25 19:06 stlohrey

Dear @stlohrey, Could you please share some insights on how to prepare data and structure it appropriately? Additionally, we'd appreciate it if you could explain how to prepare a tokenizer as well.

cod3r0k avatar Jun 07 '25 04:06 cod3r0k

@stlohrey thank you for doing this and making it public! in your experiments with German is it enough just to finetune t3 part or one needs to finetune also s3gen?

C00reNUT avatar Jun 07 '25 21:06 C00reNUT

What concerns do I have with Arabic UTF8-based language? and its data preparation? @stlohrey

cod3r0k avatar Jun 12 '25 07:06 cod3r0k

@stlohrey ,

I am working with Indian languages. The fine-tuned model is generating audio, but the speech does not correspond to the input text. How did you handle tokenization for German?

anjalyv avatar Jun 19 '25 02:06 anjalyv

Check the tokenizer in the huggingface repo - There are already German letters present, so there is no need to change anything since the model was already trained with this config.

I tried to train on some other EU languages that need tokenizer partial change - few of the letters are missing - it's best to replace some letters present in the vocab that you will not use with the missing letters, so the model keeps the knowledge from the previous training for common alphabet...

I have no experience with Indian languages - if they use completely different alphabet different than latin one, you will need to modify the tokenizer completely - check the recent video about japanese training released recently - https://www.youtube.com/watch?v=G0BoZfO8a5c

C00reNUT avatar Jun 19 '25 09:06 C00reNUT

First off, just want to say a heartfelt thanks for all the love and excitement around this model. We’re genuinely grateful for the open source community, and honestly, a lot of what we do is inspired by your projects, questions, and experiments.

We’ve had quite a few people asking for the training code. I get it, openness is a big part of what makes this space awesome. I want to be real with you: building these models is not cheap. Training, testing, and keeping everything running takes a lot of resources, both time and money. In order to keep this going and actually be able to release more cool stuff in the future, we need a way to keep the project sustainable.

For now, that means we’re not releasing the training code, and fine-tuning will be something we support through our paid API (https://app.resemble.ai). This helps us pay the bills and keep pushing out models that (hopefully) benefit everyone.

We love being part of this community, and we’re doing our best to strike a balance between openness and being able to keep doing the work. We hope you understand. If you have thoughts, feedback, or just want to chat, our door is always open.

I liked the on-the-box output of the model. Although I hate the fact that I had to pay for an API in order to train a different voice or to another language, I can't argue and think of a better method that both support the development and the community. I hope you guys come up with a better method, I really want to try and train a different voice.

rosx27 avatar Jun 24 '25 12:06 rosx27

Check the tokenizer in the huggingface repo - There are already German letters present, so there is no need to change anything since the model was already trained with this config.

I tried to train on some other EU languages that need tokenizer partial change - few of the letters are missing - it's best to replace some letters present in the vocab that you will not use with the missing letters, so the model keeps the knowledge from the previous training for common alphabet...

I have no experience with Indian languages - if they use completely different alphabet different than latin one, you will need to modify the tokenizer completely - check the recent video about japanese training released recently - https://www.youtube.com/watch?v=G0BoZfO8a5c

Thank you @C00reNUT .. got a good starting point here.

anjalyv avatar Jun 27 '25 08:06 anjalyv