piper
piper copied to clipboard
timings of phonems as it gets streamed raw
It would be very helpful if I could anyhow get the timings of phonemes frames being produced, I can't seem to find any.. other tts implmentations have this feature. Any help is welcome.
Thanks
Looking for the same thing as I'd like to use it for animation of an avatar. Perhaps the only way to do this is to run the wav file output through some other analysis software that detects phonemes (or at least some major ones like the vowels at least), but I have not yet found a good tool for that. It would ofc be fantastic if this could be done directly by Piper.