piper icon indicating copy to clipboard operation
piper copied to clipboard

support for alignment output in tsv format

Open contentnation opened this issue 5 months ago • 7 comments

Support of alignment data output. Kind of matching on issue #364 Can be used as a base for #391 and #361 Runs text to speech 2 times, one for normal audio generation, a second time for each word. Since both produce different outputs and times, a correction is applied. Not perfect, but "good enough". Both will self sync after each sentence, so only slight offset are created.

contentnation avatar Feb 23 '24 16:02 contentnation

I've been trying this out. Looks like when using a long text some of the last words are being skipped in the alignment file.

vytskalt avatar Apr 03 '24 11:04 vytskalt

@vytskalt can you provide an example so I can debug/fix it?

contentnation avatar Apr 03 '24 12:04 contentnation

@vytskalt can you provide an example so I can debug/fix it?

Yes, this is the command I'm running:

cat text.txt | piper --sentence-silence 0.5 -m en_US-ryan-high --output_file out.wav --alignment-data alignment.tsv

This is the text (random Reddit post): text.txt

In the alignment.tsv, 2 of the last words are missing.

vytskalt avatar Apr 03 '24 13:04 vytskalt

ok, it's not the length that is the issue, it's the content. For example: "musical/sport" will be spoken as 3 words. "in the" is mangled into one spoken word. My word/phoneme sync trips over this. Needs to be fixed, I have to find another way to sync.

contentnation avatar Apr 03 '24 13:04 contentnation

Hi,

i pulled this pull request and make a build but the --ali gnment-data is not disponible in the executable "piper" in the install folder.

Am i missing something to make it work ?

Thanks (:

charlyhayoz avatar May 07 '24 15:05 charlyhayoz

It is only built into the python script, not in the c++ executable.

contentnation avatar May 07 '24 15:05 contentnation

Make sense ! Thanks (:

charlyhayoz avatar May 08 '24 09:05 charlyhayoz