Add `DocumentCaptioner`: takes in Image Documents and returns same Documents with an image enhanced with text description

Open sjrl opened this issue 6 months ago • 0 comments

Currently we only can process Image Documents by embedding them with the new ImageEmbedder.

However, another approach would be to caption the image or extract the text from a scanned PDF. In this case it would be great to have a component like a DocumentCaptioner that can take in a list of Image Documents and enhance them with a text description either via OCR or via a LLM with vision capabilities.

It's possible we may want to separate components for the OCR or Vision LLM approach

Jun 13 '25 13:06 sjrl