Add support for GOT-OCR2.0
Model description
As an OCR-2.0 model, GOT can handle all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors.
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
Implementation: https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ Paper: https://arxiv.org/abs/2409.01704
Hello, If someone from the core-team is not already working on this, or there is interest it in, I would really love to contribute this model to transformers with some help!
hi @VladOS95-cyber If you don't mind can I help u with this issue, if you are working on it ?
Hi @GargDivanshu, I don't mind at all, let's wait for decision from @qubvel @LysandreJik
Hey @VladOS95-cyber @GargDivanshu ! I'm planning to start working on it very soon, I'll tag this issue once I've opened a PR for it, if you want to have a look then!
cool 🙌
+1
Any movement on this? Looking forward to trying it out
Been testing stepfun's demo code along with the model -- would love to see this in transformers!
Hey all! Implementation is well underway, and I'll open a PR in a couple of days for it (the entire Hugging Face team is currently at an off-site). Most likely, only inference will be available initially, and support for fine-tuning will be added if there is strong demand for it.
@yonigozlan if you remember, please paste the PR link in this thread, would love to subscribe
Hi again! The GOT-OCR PR is live here if you want to follow the progress :)
GOT-OCR is now merged in the main branch of Transformers 🤗