ag2 icon indicating copy to clipboard operation
ag2 copied to clipboard

[Feature Request]: OCR capability with DocAgent

Open marklysze opened this issue 8 months ago • 2 comments

Is your feature request related to a problem? Please describe.

DocAgent doesn't have OCR capabilities, and this is definitely needed for PDF, but also could be good for images (so someone can ask about an image).

MistralOCR has a low-cost PDF to markdown endpoint that is quite effective. I've also found Gemini 2.5 Pro to be the best, using pd2image to convert a PDF to images (200dpi) and then MultiModalConversableAgent to convert each page to markdown (then combine together).

Describe the solution you'd like

No response

Additional context

No response

marklysze avatar May 13 '25 19:05 marklysze

@qingyun-wu @marklysze @sonichi is this still actual?

Lancetnik avatar Aug 06 '25 20:08 Lancetnik

@marklysze Please help update us on this. Have you added the OCR already?

qingyun-wu avatar Aug 09 '25 22:08 qingyun-wu