Janus
Janus copied to clipboard
Textbox coordinates for OCR
Is Janus Pro able to provide textbox coordinates for OCR? From my testing of Janus and my understanding of multi-modal models, I found that they generally cannot give this accurately. Aside from being able to detect the text and parse it correctly, I would like to crop the textlines. Currently, I use PaddleOcr, but I was experimenting whether LLMs or multi-modal models could do this.