Option to train on raw text

Open thewh1teagle opened this issue 10 months ago • 5 comments

I would like to train on raw text. I looked in the code of codepoints feature and looks like it supports only uk alphabet. I would like to do it on raw Hebrew characters. currently I just mapped them to random IPA phonemes but I'm not sure if it is good idea Maybe we can just use their UTF-8 value as the token itself

May 24 '25 18:05 thewh1teagle

Hebrew

what is the latest update, what in your experience, a solution for piper for Hebrew

May 29 '25 10:05 GeorgeS2019

@thewh1teagle

Is it good enough to be converted to Onnx for use in Piper and to be listed here as Hebrew Voice? https://github.com/rhasspy/piper/blob/master/VOICES.md

May 29 '25 10:05 GeorgeS2019

@GeorgeS2019 Yes, it's not perfect but much better than any open source Hebrew models out there. Follow in the link I sent for updates

May 29 '25 10:05 thewh1teagle

@thewh1teagle

https://github.com/rhasspy/piper/discussions/795#discussioncomment-13308460

May 29 '25 11:05 GeorgeS2019

Example: https://github.com/rhasspy/piper/issues/757#issuecomment-2982457812

Jun 19 '25 21:06 GeorgeS2019