icefall icon indicating copy to clipboard operation
icefall copied to clipboard

WIP: Add doc about FST-based CTC forced alignment.

Open csukuangfj opened this issue 1 year ago • 2 comments

It is based on CTC FORCED ALIGNMENT API TUTORIAL from torchaudio, but we are using an FST-based approach.

I can produce identical output with torchaudio using https://github.com/k2-fsa/kaldi-decoder. Screenshot 2024-01-30 at 19 31 36

I am refactoring the code and will prepare at least two colab notebooks.

csukuangfj avatar Jan 30 '24 11:01 csukuangfj

Hi, The align tool can make the word time stamp is accurate on the begin and end postion ?

whaozl avatar Feb 07 '24 07:02 whaozl

Hi, The align tool can make the word time stamp is accurate on the begin and end postion ?

It depends on what model you use.

You can have a look at https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html We can produce identical results with torchaudio using the same model.

csukuangfj avatar Feb 07 '24 11:02 csukuangfj

@csukuangfj Will it be completed soon?

lifeiteng avatar Jun 10 '24 16:06 lifeiteng

@csukuangfj Will it be completed soon?

Yes. I am working on it now.

csukuangfj avatar Jun 12 '24 07:06 csukuangfj

@csukuangfj Has k2-based approach been forgot? 截屏2024-06-12 19 50 39

lifeiteng avatar Jun 12 '24 11:06 lifeiteng

No, it is TODO. Please use the first approach at present or you can add the second approach with k2 by yourself. All APIs you need are there. You only need to combine them.

csukuangfj avatar Jun 12 '24 11:06 csukuangfj