piper-phonemize
piper-phonemize copied to clipboard
Multiple Phonemizer Support
The piper-phonemizer setup is a bit confusing at the moment as it's both a included with some significant code and a library imported at runtime. The two phonemizers text and espeak are both tightly integrated in piper and piper-phonemize. Furthermore, they are linked with the espeak-ng library which has the GPL license meaning piper-phonemize is also under the GPL license (when distributed) and thus also piper is under the GPL license.
My proposal is this:
- Create a standard interface for a phonemizer between piper/piper-phonemize. This could be 3 functions: initialize, phonemize, terminate. The initialize could also pass in configuration data if required.
- Have the phonemizer be selectable at startup via a flag instead of from the voice config. I'm not sure technically if there's a reason the phonemes are configured in the voice .json file, but it seems like that's not entirely necessary as long as the phonemes match.
- Separate the phonemizers within piper-phonemize to be different libraries that are loaded only if the configuration requires it. For example on Linux to phonemize text into a vector of phonemes using espeak:
auto libraryHandle = dlopen("phoenmizer_espeak.so", RTLD_LAZY);
auto phonemizeFn = (void (*)(const std::string, std::vector<std::vector<Phoneme>>&))GetProcAddress(static_cast<HMODULE>(libraryHandle), "phonemize");
phonemizeFn(text, phonemes);
This would allow an easy way to integrate a new phonemizer without updating both programs and even allows a new library to be added without updating piper-phonemize. Plus, the dependency on espeak-ng would be optional which means it could be distributed under the much more permissive MIT license.
I can implement some of the changes to do this, but as it would be a fairly substantial change, I thought it would be best to discuss it first