TensorFlowASR icon indicating copy to clipboard operation
TensorFlowASR copied to clipboard

token level timestep

Open Mddct opened this issue 4 years ago • 6 comments

Is it possible to output token level timestep?

eg: hello 100-600 world 712-900 .......

Mddct avatar Dec 14 '20 14:12 Mddct

@Mddct I don't quite understand your question and example. Currently we tokenize string (aka label) into list of characters or list of subwords.

nglehuy avatar Dec 15 '20 13:12 nglehuy

Output label and its start time and end time in orign wav。

Mddct avatar Dec 15 '20 13:12 Mddct

@Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?

nglehuy avatar Dec 15 '20 14:12 nglehuy

I am trying to do this feature now on transducer 。But I could not find any paper and article related。

Nguyễn Lê Huy [email protected] 于 2020年12月15日周二 下午10:48写道:

@Mddct https://github.com/Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TensorSpeech/TensorFlowASR/issues/78#issuecomment-745340810, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3QYUJITAQYCUHIQ4Y53SU5ZLVANCNFSM4U22QBMA .

Mddct avatar Dec 16 '20 01:12 Mddct

There's this but for CTC, maybe we can apply it with some modification.

nglehuy avatar Dec 16 '20 02:12 nglehuy

@usimarit It seems two steps: 1 calculate the mean_start_shift, mean_end_shift 2 apply shift on each start and end

I will evaluate the accuracy later. But for other language or own corpus, we need do force aligment to get time information。

Mddct avatar Dec 16 '20 04:12 Mddct