allosaurus
allosaurus copied to clipboard
The timestamp of model 'interspeech21' is incorrect
I run the following command:
python -m allosaurus.run --timestamp=True -i sample.wav -m interspeech21
and it gives me
0.040 0.025 ɑ 0.080 0.025 l 0.100 0.025 ʌ 0.120 0.025 s 0.140 0.025 o 0.170 0.025 ɹ 0.180 0.025 ə 0.200 0.025 s
This is incorrect for the sample audio. Seems the window shift is set wrongly.
I am struggling with the timing as well. Is anybody aware of any library able to do a forced alignment of phonemes based on the input from allosaurus? I would really appreciate any input and tipps on how I can improve the output from allosaurus.
I am also looking for something like this
Hi guys, sorry I was a bit busy with other projects and my internship in the last few months and did not have time to look at it.
I forgot to count the subsampling factor from the conv layer, i fixed it in the latest commit.
A very useful library -- thank you for creating it. I also have a timing issue. The onset of the phonemes seems to be reported correctly, but the duration of each shows as 0.045 regardless of how long each phoneme actually is. I need to detect pauses so accurate durations would be very helpful. Here's the output I get:
0.840 0.045 ʔ 0.870 0.045 a 0.900 0.045 l̪ 0.960 0.045 t̪ 0.990 0.045 ɒ 1.080 0.045 k͡p̚ 1.140 0.045 a 1.260 0.045 t̪ 1.320 0.045 ɒ 1.380 0.045 t̪ 1.440 0.045 ɒ 1.470 0.045 k