ag2
ag2 copied to clipboard
[Feature Request]: OCR capability with DocAgent
Is your feature request related to a problem? Please describe.
DocAgent doesn't have OCR capabilities, and this is definitely needed for PDF, but also could be good for images (so someone can ask about an image).
MistralOCR has a low-cost PDF to markdown endpoint that is quite effective. I've also found Gemini 2.5 Pro to be the best, using pd2image to convert a PDF to images (200dpi) and then MultiModalConversableAgent to convert each page to markdown (then combine together).
Describe the solution you'd like
No response
Additional context
No response
@qingyun-wu @marklysze @sonichi is this still actual?
@marklysze Please help update us on this. Have you added the OCR already?