icefall WIP: Add doc about FST-based CTC forced alignment.

WIP: Add doc about FST-based CTC forced alignment.

Open csukuangfj opened this issue 1 year ago • 2 comments

It is based on CTC FORCED ALIGNMENT API TUTORIAL from torchaudio, but we are using an FST-based approach.

I can produce identical output with torchaudio using https://github.com/k2-fsa/kaldi-decoder. Screenshot 2024-01-30 at 19 31 36

I am refactoring the code and will prepare at least two colab notebooks.

Jan 30 '24 11:01 csukuangfj

Hi, The align tool can make the word time stamp is accurate on the begin and end postion ?

Feb 07 '24 07:02 whaozl

Hi, The align tool can make the word time stamp is accurate on the begin and end postion ?

It depends on what model you use.

You can have a look at https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html We can produce identical results with torchaudio using the same model.

Feb 07 '24 11:02 csukuangfj

@csukuangfj Will it be completed soon?

Jun 10 '24 16:06 lifeiteng

@csukuangfj Will it be completed soon?

Yes. I am working on it now.

Jun 12 '24 07:06 csukuangfj

@csukuangfj Has k2-based approach been forgot? 截屏2024-06-12 19 50 39

Jun 12 '24 11:06 lifeiteng

No, it is TODO. Please use the first approach at present or you can add the second approach with k2 by yourself. All APIs you need are there. You only need to combine them.

Jun 12 '24 11:06 csukuangfj

icefall icefall copied to clipboard

WIP: Add doc about FST-based CTC forced alignment.

icefall
icefall copied to clipboard