transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add support for GOT-OCR2.0

Open VladOS95-cyber opened this issue 1 year ago • 5 comments

Model description

As an OCR-2.0 model, GOT can handle all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors.

Open source status

  • [X] The model implementation is available
  • [X] The model weights are available

Provide useful links for the implementation

Implementation: https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ Paper: https://arxiv.org/abs/2409.01704

VladOS95-cyber avatar Oct 15 '24 10:10 VladOS95-cyber

Hello, If someone from the core-team is not already working on this, or there is interest it in, I would really love to contribute this model to transformers with some help!

VladOS95-cyber avatar Oct 15 '24 10:10 VladOS95-cyber

hi @VladOS95-cyber If you don't mind can I help u with this issue, if you are working on it ?

GargDivanshu avatar Oct 16 '24 07:10 GargDivanshu

Hi @GargDivanshu, I don't mind at all, let's wait for decision from @qubvel @LysandreJik

VladOS95-cyber avatar Oct 16 '24 07:10 VladOS95-cyber

Hey @VladOS95-cyber @GargDivanshu ! I'm planning to start working on it very soon, I'll tag this issue once I've opened a PR for it, if you want to have a look then!

yonigozlan avatar Oct 17 '24 14:10 yonigozlan

cool 🙌

GargDivanshu avatar Oct 17 '24 18:10 GargDivanshu

+1

Youho99 avatar Nov 04 '24 11:11 Youho99

Any movement on this? Looking forward to trying it out

jshcrm avatar Nov 06 '24 06:11 jshcrm

Been testing stepfun's demo code along with the model -- would love to see this in transformers!

plamb-viso avatar Nov 11 '24 16:11 plamb-viso

Hey all! Implementation is well underway, and I'll open a PR in a couple of days for it (the entire Hugging Face team is currently at an off-site). Most likely, only inference will be available initially, and support for fine-tuning will be added if there is strong demand for it.

yonigozlan avatar Nov 11 '24 16:11 yonigozlan

@yonigozlan if you remember, please paste the PR link in this thread, would love to subscribe

plamb-viso avatar Nov 11 '24 16:11 plamb-viso

Hi again! The GOT-OCR PR is live here if you want to follow the progress :)

yonigozlan avatar Nov 13 '24 20:11 yonigozlan

GOT-OCR is now merged in the main branch of Transformers 🤗

yonigozlan avatar Jan 31 '25 21:01 yonigozlan