piper
piper copied to clipboard
support for alignment output in tsv format
Support of alignment data output. Kind of matching on issue #364 Can be used as a base for #391 and #361 Runs text to speech 2 times, one for normal audio generation, a second time for each word. Since both produce different outputs and times, a correction is applied. Not perfect, but "good enough". Both will self sync after each sentence, so only slight offset are created.
I've been trying this out. Looks like when using a long text some of the last words are being skipped in the alignment file.
@vytskalt can you provide an example so I can debug/fix it?
@vytskalt can you provide an example so I can debug/fix it?
Yes, this is the command I'm running:
cat text.txt | piper --sentence-silence 0.5 -m en_US-ryan-high --output_file out.wav --alignment-data alignment.tsv
This is the text (random Reddit post): text.txt
In the alignment.tsv, 2 of the last words are missing.
ok, it's not the length that is the issue, it's the content. For example: "musical/sport" will be spoken as 3 words. "in the" is mangled into one spoken word. My word/phoneme sync trips over this. Needs to be fixed, I have to find another way to sync.
Hi,
i pulled this pull request and make a build but the --ali gnment-data is not disponible in the executable "piper" in the install folder.
Am i missing something to make it work ?
Thanks (:
It is only built into the python script, not in the c++ executable.
Make sense ! Thanks (: