haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Create Image Indexing Example

Open sjrl opened this issue 8 months ago • 2 comments

In addition to the ImageFileToImageContent and PDFToImageContent components we should add an Indexing example for how to use these conversion components to convert Image Files to Haystack Documents and then write those into a database.

For inspiration we should consult https://github.com/deepset-ai/dc-pipeline-templates/blob/main/templates/Vision_gpt4o_en_indexing.yaml

It requires both the FileToImageContent and PDFToImageContent converters as well as some additional ones (e.g. ChatPromptBuilders + ChatGenerators) to perform Image Captioning using an LLM and then more components to convert the Image caption + ImageContent.meta back into a Haystack Document. So we may want to consider adding a DocumentBuilder component to help with this process.

This would be valuable both for exploring the full flow and as a material to share with users.

sjrl avatar Apr 29 '25 10:04 sjrl