CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

add another special_tokens

Open 0913ktg opened this issue 9 months ago • 4 comments

Hello,

I would like to train a model by adding new tokens and corresponding audio sounds that are not included in the additional_special_tokens of the QwenTokenizer.

For example, I want to add a token like [cry] along with the corresponding crying sound in the training dataset.

Could you advise how much audio data is typically required for the model to accurately generate the intended sound when inputting a custom token such as [cry]?

Would this require retraining or fine-tuning the pre-trained model extensively?

I've already tried training with approximately 200 audio samples but haven't observed any noticeable improvements or desired outcomes.

I would greatly appreciate any suggestions or recommendations you might have.

Thank you!

0913ktg avatar Mar 04 '25 05:03 0913ktg

add our Dingding chat group, maybe 陈谦 can answer your question

aluminumbox avatar Mar 06 '25 08:03 aluminumbox

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Apr 06 '25 02:04 github-actions[bot]

@0913ktg have you able to add new tokens correctly ?

haziyevv avatar May 14 '25 11:05 haziyevv

@haziyevv hi, i can't add new tokens.

0913ktg avatar May 16 '25 00:05 0913ktg