doctr icon indicating copy to clipboard operation
doctr copied to clipboard

Request For Adding ParSeq - text recognition model

Open nikokks opened this issue 2 years ago β€’ 17 comments

πŸš€ The feature

Hello,

I mainly use the text detection and text recognition models with your framework.

As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.

However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.

Here are the references: https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code https://github.com/baudm/parseq

Would it be possible to add this recognition text to the models you propose?

Thanks a lot for your work !

Motivation, pitch

I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets. I would like use this model with your framework pipelines

Alternatives

No response

Additional context

No response

nikokks avatar Jul 30 '22 12:07 nikokks

Hi @nikokks πŸ‘‹ , fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2

I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of thisπŸ‘

Are you maybe interested to open a PR for this model we would be happy to help with ! Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)

Do you have experience / tested the models latency on cpu ? Would be interesting to see

felixdittrich92 avatar Jul 30 '22 21:07 felixdittrich92

I agree that this could be a good candidate for new text recognition models in 0.6.0 :) Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?

frgfm avatar Aug 01 '22 08:08 frgfm

@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)

felixdittrich92 avatar Aug 01 '22 08:08 felixdittrich92

@felixdittrich92 Done :)

frgfm avatar Aug 01 '22 15:08 frgfm

@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)

felixdittrich92 avatar Aug 01 '22 16:08 felixdittrich92

I think we should keep this:

  • the PR will close this issue, and the release tracker will get the checkbox ticked

So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)

frgfm avatar Aug 01 '22 17:08 frgfm

πŸ‘

felixdittrich92 avatar Aug 01 '22 17:08 felixdittrich92

I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq). I think the code might not be too complicated to integrate.

On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data πŸ˜‰

Several questions will come from me on the choices of implementation and integration in doctr:

  • can we integrate libraries like timm
  • do I have to do it in TF too (torch only if possible)
  • if I can add additional utils.py in the future ParSeq folder if this is not your habits.
  • etc.

I will clarify my questions rather next weekend πŸ˜€

You can close this issue

nikokks avatar Aug 01 '22 22:08 nikokks

Hi @nikokks πŸ‘‹ ,

lets keep it open for further conversation about ParSeq πŸ‘

About your points:

  • timm is a great library but it is mostly pinned in librarys which use it to a fixed version specifier in fact of maintaince and it would be another extra dependency in doctr. So i would suggest to implement ViT from scratch inside the classification models to use it as feature extractor (@frgfm we should move transformers components from models/recognition to models that we can reuse the modules wdyt?)
  • it would be great to implement it in both frameworks but it's also fine if you only want to focus on PT ... anyone else could do the translation πŸ‘
  • if it is stuff which is only ParSeq related and does not depends on PT or TF i would say base.py is the right place (but lets take a look if we see it in a PR πŸ˜…

We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?

felixdittrich92 avatar Aug 02 '22 08:08 felixdittrich92

Hi,

ok for timm.

Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?

nikokks avatar Aug 06 '22 15:08 nikokks

Hi @nikokks πŸ‘‹ , Why do you think we need this ? :) We should implement all models in plain pytorch / tensorflow without any wrappers

felixdittrich92 avatar Aug 06 '22 15:08 felixdittrich92

ok, it sounds good for me :)

I have added parseq class on my fork. Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part) I'll do it in the next days or next weekend.

nikokks avatar Aug 06 '22 16:08 nikokks

@nikokks I would suggest the following steps (every should be one PR)

  • move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
  • implement ViT as classification model
  • implement the ParSeq Decoder side which use ViT as feature extractor

felixdittrich92 avatar Aug 06 '22 18:08 felixdittrich92

  • move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)

I agree :+1:

  • implement ViT as classification model

Yup, but giving credits to the rightful contributors / source of inspiration when relevant!

frgfm avatar Aug 08 '22 05:08 frgfm

@nikokks Now you can reuse the already implemented transformer parts for ViT :+1:

felixdittrich92 avatar Aug 08 '22 09:08 felixdittrich92

Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also πŸ‘

felixdittrich92 avatar Sep 04 '22 17:09 felixdittrich92

Hi @nikokks, are you still interested to add ParSeq ? :) After #1063 ViT has finally found it's way in doctr as backbone. And #1055 will be updated soon (which could be used as template also for ParSeq)

felixdittrich92 avatar Sep 16 '22 13:09 felixdittrich92

Hello, I am currently implementing ParSeq. It is now working with quite good predictions :) Do you have any good advice to do a good pull request ? best

nikokks avatar May 29 '23 19:05 nikokks

Hey @nikokks πŸ‘‹ ,

You can take a look into https://github.com/mindee/doctr/pull/1055/files which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :) Otherweise open a PR and we will guide you step by step πŸ‘

PS: If you have only the PT implementation that's fine we can port it later to TF :)

felixdittrich92 avatar May 29 '23 20:05 felixdittrich92

Finished after #1227 and #1228 are merged :)

felixdittrich92 avatar Jun 23 '23 06:06 felixdittrich92