TensorFlowASR token level timestep

Is it possible to output token level timestep？

eg： hello 100-600 world 712-900 .......

Dec 14 '20 14:12 Mddct

@Mddct I don't quite understand your question and example. Currently we tokenize string (aka label) into list of characters or list of subwords.

Dec 15 '20 13:12 nglehuy

Output label and its start time and end time in orign wav。

Dec 15 '20 13:12 Mddct

@Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?

Dec 15 '20 14:12 nglehuy

I am trying to do this feature now on transducer 。But I could not find any paper and article related。

Nguyễn Lê Huy [email protected] 于 2020年12月15日周二下午10:48写道：

@Mddct https://github.com/Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TensorSpeech/TensorFlowASR/issues/78#issuecomment-745340810, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3QYUJITAQYCUHIQ4Y53SU5ZLVANCNFSM4U22QBMA .

Dec 16 '20 01:12 Mddct

There's this but for CTC, maybe we can apply it with some modification.

Dec 16 '20 02:12 nglehuy

@usimarit It seems two steps: 1 calculate the mean_start_shift, mean_end_shift 2 apply shift on each start and end

I will evaluate the accuracy later. But for other language or own corpus, we need do force aligment to get time information。

Dec 16 '20 04:12 Mddct

TensorFlowASR TensorFlowASR copied to clipboard

token level timestep

TensorFlowASR
TensorFlowASR copied to clipboard