haystack
haystack copied to clipboard
Add `DocumentCaptioner`: takes in Image Documents and returns same Documents with an image enhanced with text description
Currently we only can process Image Documents by embedding them with the new ImageEmbedder.
However, another approach would be to caption the image or extract the text from a scanned PDF. In this case it would be great to have a component like a DocumentCaptioner that can take in a list of Image Documents and enhance them with a text description either via OCR or via a LLM with vision capabilities.
It's possible we may want to separate components for the OCR or Vision LLM approach