allosaurus icon indicating copy to clipboard operation
allosaurus copied to clipboard

The timestamp of model 'interspeech21' is incorrect

Open owaski opened this issue 2 years ago • 5 comments

I run the following command:

python -m allosaurus.run --timestamp=True -i sample.wav -m interspeech21

and it gives me

0.040 0.025 ɑ 0.080 0.025 l 0.100 0.025 ʌ 0.120 0.025 s 0.140 0.025 o 0.170 0.025 ɹ 0.180 0.025 ə 0.200 0.025 s

This is incorrect for the sample audio. Seems the window shift is set wrongly.

owaski avatar Apr 15 '22 18:04 owaski

I am struggling with the timing as well. Is anybody aware of any library able to do a forced alignment of phonemes based on the input from allosaurus? I would really appreciate any input and tipps on how I can improve the output from allosaurus.

SlistInc avatar Apr 20 '22 05:04 SlistInc

I am also looking for something like this

journeytosilius avatar Apr 26 '22 22:04 journeytosilius

Hi guys, sorry I was a bit busy with other projects and my internship in the last few months and did not have time to look at it.

I forgot to count the subsampling factor from the conv layer, i fixed it in the latest commit.

xinjli avatar Jun 12 '22 22:06 xinjli

A very useful library -- thank you for creating it. I also have a timing issue. The onset of the phonemes seems to be reported correctly, but the duration of each shows as 0.045 regardless of how long each phoneme actually is. I need to detect pauses so accurate durations would be very helpful. Here's the output I get:

0.840 0.045 ʔ 0.870 0.045 a 0.900 0.045 l̪ 0.960 0.045 t̪ 0.990 0.045 ɒ 1.080 0.045 k͡p̚ 1.140 0.045 a 1.260 0.045 t̪ 1.320 0.045 ɒ 1.380 0.045 t̪ 1.440 0.045 ɒ 1.470 0.045 k

kzgajos avatar Aug 30 '22 13:08 kzgajos