Christoph Auer
Christoph Auer
This was merged on a derived PR.
@rateixei I see that the CI tests are failing, can you please re-generate the test GT? We need to see if it passes after that. Also, please rebase from `main`.
Note: We will hold off with merging this until the design proposal for inline styles is implemented: https://github.com/DS4SD/docling/discussions/894
@Nowheresly Thanks for providing a sample for this edge case. We are actively working on this topic, stay tuned for future updates.
After checking closer, @JeandeBalzac your issue does not appear to be connected to portrait layout. It is simply because there are many elements identified as figures, and these will export...
@JeandeBalzac if you have more affected PDFs please attach them here, we need to analyze this problem more broadly.
@benzhang-se The core problem is representing and extracting picture contents. We are actively working on creating datasets and models for this purpose. Once available it will be announced in the...
I am closing this issue, since we decided that "approximate" pagination in Word is not feasible to include.
TODO - [x] Put DoclingParseV1DocumentBackend back, mark as deprecated - [x] Correct handling of `BoundingRectangle.to_bounding_box()` when text cells are rotated, instead of fixing it in `get_text_cells`. - [x] Add pipeline...
@samhita-alla I am reproducing this with Docling 2.17.0 and confirm there is most content detected as picture only. It will need some deeper analysis on the layout model.