Textbox coordinates for OCR

Open sickerin opened this issue 10 months ago • 0 comments

Is Janus Pro able to provide textbox coordinates for OCR? From my testing of Janus and my understanding of multi-modal models, I found that they generally cannot give this accurately. Aside from being able to detect the text and parse it correctly, I would like to crop the textlines. Currently, I use PaddleOcr, but I was experimenting whether LLMs or multi-modal models could do this.

Feb 04 '25 06:02 sickerin