doctr
doctr copied to clipboard
Request For Adding ParSeq - text recognition model
π The feature
Hello,
I mainly use the text detection and text recognition models with your framework.
As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.
However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.
Here are the references: https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code https://github.com/baudm/parseq
Would it be possible to add this recognition text to the models you propose?
Thanks a lot for your work !
Motivation, pitch
I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets. I would like use this model with your framework pipelines
Alternatives
No response
Additional context
No response
Hi @nikokks π , fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2
I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of thisπ
Are you maybe interested to open a PR for this model we would be happy to help with ! Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)
Do you have experience / tested the models latency on cpu ? Would be interesting to see
I agree that this could be a good candidate for new text recognition models in 0.6.0 :) Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?
@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)
@felixdittrich92 Done :)
@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)
I think we should keep this:
- the PR will close this issue, and the release tracker will get the checkbox ticked
So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)
π
I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq). I think the code might not be too complicated to integrate.
On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data π
Several questions will come from me on the choices of implementation and integration in doctr:
- can we integrate libraries like timm
- do I have to do it in TF too (torch only if possible)
- if I can add additional utils.py in the future ParSeq folder if this is not your habits.
- etc.
I will clarify my questions rather next weekend π
You can close this issue
Hi @nikokks π ,
lets keep it open for further conversation about ParSeq π
About your points:
- timm is a great library but it is mostly pinned in librarys which use it to a fixed version specifier in fact of maintaince and it would be another extra dependency in doctr. So i would suggest to implement ViT from scratch inside the classification models to use it as feature extractor (@frgfm we should move transformers components from models/recognition to models that we can reuse the modules wdyt?)
- it would be great to implement it in both frameworks but it's also fine if you only want to focus on PT ... anyone else could do the translation π
- if it is stuff which is only ParSeq related and does not depends on PT or TF i would say base.py is the right place (but lets take a look if we see it in a PR π
We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?
Hi,
ok for timm.
Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?
Hi @nikokks π , Why do you think we need this ? :) We should implement all models in plain pytorch / tensorflow without any wrappers
ok, it sounds good for me :)
I have added parseq class on my fork. Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part) I'll do it in the next days or next weekend.
@nikokks I would suggest the following steps (every should be one PR)
- move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
- implement ViT as classification model
- implement the ParSeq Decoder side which use ViT as feature extractor
- move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
I agree :+1:
- implement ViT as classification model
Yup, but giving credits to the rightful contributors / source of inspiration when relevant!
@nikokks Now you can reuse the already implemented transformer parts for ViT :+1:
Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also π
Hi @nikokks, are you still interested to add ParSeq ? :) After #1063 ViT has finally found it's way in doctr as backbone. And #1055 will be updated soon (which could be used as template also for ParSeq)
Hello, I am currently implementing ParSeq. It is now working with quite good predictions :) Do you have any good advice to do a good pull request ? best
Hey @nikokks π ,
You can take a look into https://github.com/mindee/doctr/pull/1055/files which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :) Otherweise open a PR and we will guide you step by step π
PS: If you have only the PT implementation that's fine we can port it later to TF :)
Finished after #1227 and #1228 are merged :)