bark-voice-cloning-HuBERT-quantizer icon indicating copy to clipboard operation
bark-voice-cloning-HuBERT-quantizer copied to clipboard

Voice to semantic

Open huydung179 opened this issue 2 years ago • 3 comments

If I well understood, you used a custom semantic-voice dataset for training your HuBERT model. Can you tell me how to create this dataset? Especially how to get the semantic from a voice? Many thanks for this work.

huydung179 avatar Jun 27 '23 13:06 huydung179

The dataset creation code is up at https://github.com/gitmylo/bark-data-gen

To get the semantics from a voice, you have to use a trained HuBERT quantizer model. See a problem? It cannot be improved for a specific voice, because all you could train on, is previous outputs.

To understand why it works, you need to understand how bark works. https://github.com/gitmylo/audio-webui/wiki/how-bark-works The quantizer model just converts recognized speech patterns into a format which bark understands, and is able to complete. Essentially cloning a voice.

gitmylo avatar Jun 27 '23 13:06 gitmylo

Dear gitmylo, I also want to know how to create semantic data from wav source files. I gather Korean wav files and I need to make semantic data from them, also need to pre-train both semantic data and wav files. Could you explain about details. I really appreciate your great job.

iamhch24 avatar Sep 01 '23 06:09 iamhch24

If you want to train, you'll need a text dataset in the language you want to train for, you can modify the bark-data-gen code to load text files in another language for example. Then prepare the dataset, and train, as explained in https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself. And just follow the other steps.

gitmylo avatar Sep 01 '23 11:09 gitmylo