docling
docling copied to clipboard
Picture Description in Output
Question
I am using PictureDescriptionAPIOption to generate a description of the Image. I want to replace the placeholders for Image in the output (export_to_) with the picture description. Currently there are only 3 ImageRefModel available -> EMBEDDED, PLACEHOLDER, REFERENCED and none of them adds the description of the Image.
Is there a way to do so? I see that they are part of annotations currently.
We are planning to address this with custom serializers for picture items, i.e. some use cases need the description, others the text which was produced by OCR, other the graph data, etc. Some initial work on this should come in the next days.
@dolfim-ibm If i understand correctly, that means that we can combine multiple elements in the output for picture items ? For exemple have: description (given by a VLM) + text by OCR + reference + ... ?
That can be a very good generic solution !
Yes, that is what we would like to allow. It is clear that each use case will need a different output and instead of trying to overload with content we are thinking of making it modular and customizable.
@dolfim-ibm It would be nice if something similar was available for tables. The referenced (picture of table saved), embedded (picture embedded), extracted (OCR pull), placeholder (??). And allowing either ocr or vlm to convert would be nice additions.