transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add support for GOT-OCR2.0

Open VladOS95-cyber opened this issue 1 year ago • 5 comments

Model description

As an OCR-2.0 model, GOT can handle all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors.

Open source status

  • [X] The model implementation is available
  • [X] The model weights are available

Provide useful links for the implementation

Implementation: https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ Paper: https://arxiv.org/abs/2409.01704

VladOS95-cyber avatar Oct 15 '24 10:10 VladOS95-cyber