Upcoming support for new model architectures

Open frgfm opened this issue 3 years ago • 0 comments

As discussed in several GH issues, docTR could very well welcome new architectures for OCR :+1: Let's use this issue to track this for the next release!

A few things to consider:

docTR is not meant to make all architectures available. Let's focus on architectures that are reasonably sized and SOTA performances (or considered as a performance milestone for a given task).
it is acceptable to start with the implementation with only 1 DL backend. Although, gradually, within the next releases, full support needs to be added.
for faster iterations, training should be performed on synthetic data when available (perf will be pushed on private datasets later once the potential of the architecture is validated). A PR to add implementation for a given architecture should come with the exact args used in training to reproduce the training and the corresponding performances
we always have to credit the rightful contributors: papers are always cited in docTR, and providing an implementation is meant not to be a copy paste of another implementation. However is part of the code of someone else is used, that author should be credited ("borrowed from", "inspired by", etc.)

Here is the list of envisioned models:

Text detection

[ ] PAN - Pixel Aggregation Network (paper, implementation)

Text recognition

[x] ViTSTR (paper, implementation) #513
[ ] ParSeq (paper, implementation) #1003

Aug 01 '22 15:08 frgfm