CosyVoice add another special

add another special_tokens

Open 0913ktg opened this issue 9 months ago • 4 comments

Hello,

I would like to train a model by adding new tokens and corresponding audio sounds that are not included in the additional_special_tokens of the QwenTokenizer.

For example, I want to add a token like [cry] along with the corresponding crying sound in the training dataset.

Could you advise how much audio data is typically required for the model to accurately generate the intended sound when inputting a custom token such as [cry]?

Would this require retraining or fine-tuning the pre-trained model extensively?

I've already tried training with approximately 200 audio samples but haven't observed any noticeable improvements or desired outcomes.

I would greatly appreciate any suggestions or recommendations you might have.

Thank you!