[REQUEST] Add image captioning

Open robomotic opened this issue 11 months ago • 0 comments

Reference Issues

No response

Summary

When a document is decomposed and image are extracted, you could also extract captions for each and store the text in the document, this will provide better results.

Basic Example

We could support BLIP and OpenaAI Vision API initially.

Drawbacks

Since the text doesn't exist in the original file, we should show that the text refers to an image.

Additional information

No response

Jan 12 '25 10:01 robomotic