end2end icon indicating copy to clipboard operation
end2end copied to clipboard

clarification : gram ctc - alphabet_size ? "a","b" or "ab" single logits output?

Open danFromTelAviv opened this issue 6 years ago • 3 comments

Thank you very much for your implementation of ctc varients. To be frank I think that is the main value of this repo and I would change its name to pytorch ctc varients or something of the like because it is very very hard to find these great implementations you made. OCR, however, is pretty prevalent.

Just to clarify - for gram ctc the logits should represent single characters such as "a" and "b" or grams such as "ab" ?

and just to validate - this is an implementation of this: https://arxiv.org/pdf/1703.00096.pdf right?

Thanks, Dan

danFromTelAviv avatar Feb 19 '19 14:02 danFromTelAviv

from reading the code it does look like this is actually "gram-ctc" but the test doesn't run... It's missing the mandatory input grams.

based on : max_gram_length = len(grams.shape) ; if max_gram_length >= 4: raise NotImplementedError # num_basic_labels = grams.shape[0]

should it be a tensor of shape [(alphabet size+1) x (alphabet size+1) x (alphabet size+1)] for grams of max size 3?

danFromTelAviv avatar Feb 19 '19 14:02 danFromTelAviv

I'm sorry, Gram-CTC is not yet implemented, but it is first priority future task: https://github.com/artbataev/end2end#future-plans, and I'm working on it. For now only CTC-Loss and CTC Beam Search Decoder with language model are working https://artbataev.github.io/end2end/pytorch_end2end.html

artbataev avatar Feb 19 '19 19:02 artbataev

ok. thank you for your work. good luck !

danFromTelAviv avatar Feb 19 '19 20:02 danFromTelAviv