doctr Request For Adding ParSeq - text recognition model

🚀 The feature

Hello,

I mainly use the text detection and text recognition models with your framework.

As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.

However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.

Here are the references: https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code https://github.com/baudm/parseq

Would it be possible to add this recognition text to the models you propose?

Thanks a lot for your work !

Motivation, pitch

I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets. I would like use this model with your framework pipelines

Alternatives

No response

Additional context

No response

Jul 30 '22 12:07 nikokks

Hi @nikokks 👋 , fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2

I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of this👍

Are you maybe interested to open a PR for this model we would be happy to help with ! Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)

Do you have experience / tested the models latency on cpu ? Would be interesting to see

Jul 30 '22 21:07 felixdittrich92

I agree that this could be a good candidate for new text recognition models in 0.6.0 :) Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?

Aug 01 '22 08:08 frgfm

@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)

Aug 01 '22 08:08 felixdittrich92

@felixdittrich92 Done :)

Aug 01 '22 15:08 frgfm

@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)

Aug 01 '22 16:08 felixdittrich92

I think we should keep this:

the PR will close this issue, and the release tracker will get the checkbox ticked

So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)

Aug 01 '22 17:08 frgfm

👍

Aug 01 '22 17:08 felixdittrich92

I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq). I think the code might not be too complicated to integrate.

On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data 😉

Several questions will come from me on the choices of implementation and integration in doctr:

can we integrate libraries like timm
do I have to do it in TF too (torch only if possible)
if I can add additional utils.py in the future ParSeq folder if this is not your habits.
etc.

I will clarify my questions rather next weekend 😀

You can close this issue

Aug 01 '22 22:08 nikokks

Hi @nikokks 👋 ,

lets keep it open for further conversation about ParSeq 👍

About your points:

timm is a great library but it is mostly pinned in librarys which use it to a fixed version specifier in fact of maintaince and it would be another extra dependency in doctr. So i would suggest to implement ViT from scratch inside the classification models to use it as feature extractor (@frgfm we should move transformers components from models/recognition to models that we can reuse the modules wdyt?)
it would be great to implement it in both frameworks but it's also fine if you only want to focus on PT ... anyone else could do the translation 👍
if it is stuff which is only ParSeq related and does not depends on PT or TF i would say base.py is the right place (but lets take a look if we see it in a PR 😅

We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?

Aug 02 '22 08:08 felixdittrich92

Hi,

ok for timm.

Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?

Aug 06 '22 15:08 nikokks

Hi @nikokks 👋 , Why do you think we need this ? :) We should implement all models in plain pytorch / tensorflow without any wrappers

Aug 06 '22 15:08 felixdittrich92

ok, it sounds good for me :)

I have added parseq class on my fork. Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part) I'll do it in the next days or next weekend.

Aug 06 '22 16:08 nikokks

@nikokks I would suggest the following steps (every should be one PR)

move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
implement ViT as classification model
implement the ParSeq Decoder side which use ViT as feature extractor

Aug 06 '22 18:08 felixdittrich92

move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)

I agree :+1:

implement ViT as classification model

Yup, but giving credits to the rightful contributors / source of inspiration when relevant!

Aug 08 '22 05:08 frgfm

@nikokks Now you can reuse the already implemented transformer parts for ViT :+1:

Aug 08 '22 09:08 felixdittrich92

Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also 👍

Sep 04 '22 17:09 felixdittrich92

Hi @nikokks, are you still interested to add ParSeq ? :) After #1063 ViT has finally found it's way in doctr as backbone. And #1055 will be updated soon (which could be used as template also for ParSeq)

Sep 16 '22 13:09 felixdittrich92

Hello, I am currently implementing ParSeq. It is now working with quite good predictions :) Do you have any good advice to do a good pull request ? best

May 29 '23 19:05 nikokks

Hey @nikokks 👋 ,

You can take a look into https://github.com/mindee/doctr/pull/1055/files which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :) Otherweise open a PR and we will guide you step by step 👍

PS: If you have only the PT implementation that's fine we can port it later to TF :)

May 29 '23 20:05 felixdittrich92

Finished after #1227 and #1228 are merged :)

Jun 23 '23 06:06 felixdittrich92

doctr doctr copied to clipboard

Request For Adding ParSeq - text recognition model

🚀 The feature

Motivation, pitch

Alternatives

Additional context

doctr
doctr copied to clipboard