end2end
end2end copied to clipboard
clarification : gram ctc - alphabet_size ? "a","b" or "ab" single logits output?
Thank you very much for your implementation of ctc varients. To be frank I think that is the main value of this repo and I would change its name to pytorch ctc varients or something of the like because it is very very hard to find these great implementations you made. OCR, however, is pretty prevalent.
Just to clarify - for gram ctc the logits should represent single characters such as "a" and "b" or grams such as "ab" ?
and just to validate - this is an implementation of this: https://arxiv.org/pdf/1703.00096.pdf right?
Thanks, Dan
from reading the code it does look like this is actually "gram-ctc" but the test doesn't run... It's missing the mandatory input grams.
based on :
max_gram_length = len(grams.shape) ; if max_gram_length >= 4: raise NotImplementedError
# num_basic_labels = grams.shape[0]
should it be a tensor of shape [(alphabet size+1) x (alphabet size+1) x (alphabet size+1)] for grams of max size 3?
I'm sorry, Gram-CTC is not yet implemented, but it is first priority future task: https://github.com/artbataev/end2end#future-plans, and I'm working on it. For now only CTC-Loss and CTC Beam Search Decoder with language model are working https://artbataev.github.io/end2end/pytorch_end2end.html
ok. thank you for your work. good luck !