TensorFlowASR
TensorFlowASR copied to clipboard
token level timestep
Is it possible to output token level timestep?
eg: hello 100-600 world 712-900 .......
@Mddct I don't quite understand your question and example. Currently we tokenize string (aka label) into list of characters or list of subwords.
Output label and its start time and end time in orign wav。
@Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?
I am trying to do this feature now on transducer 。But I could not find any paper and article related。
Nguyễn Lê Huy [email protected] 于 2020年12月15日周二 下午10:48写道:
@Mddct https://github.com/Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TensorSpeech/TensorFlowASR/issues/78#issuecomment-745340810, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3QYUJITAQYCUHIQ4Y53SU5ZLVANCNFSM4U22QBMA .
There's this but for CTC, maybe we can apply it with some modification.
@usimarit It seems two steps: 1 calculate the mean_start_shift, mean_end_shift 2 apply shift on each start and end
I will evaluate the accuracy later. But for other language or own corpus, we need do force aligment to get time information。