docling
docling copied to clipboard
fix: find paragraphs in elements with images in docx
Some text is not found when using the MsWordDocumentBackend. An example docx file where this happens is attached: paragraph_in_image.docx
The pragmatic solution is to attempt to add text elements even when a drawing expression is found.
Checklist:
- [ ] Documentation has been updated, if necessary.
- [ ] Examples have been added, if necessary.
- [ ] Tests have been added, if necessary.
Merge Protections
Your pull request matches the following merge protections and will not be merged until they are valid.
🔴 Require two reviewer for test updates
This rule is failing.
When test data is updated, we require two reviewers
- [ ]
#approved-reviews-by >= 2
🟢 Enforce conventional commit
Wonderful, this rule succeeded.
Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
- [X]
title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:
@Manuel030 Thank you for the PR! Could you add this document as a test?
@PeterStaar-IBM Sure
@Manuel030 @maxmnemonic There is apparently a newer PR with the same goal here: https://github.com/docling-project/docling/pull/1610 which has the proper condition to not produce empty text paragraphs.
closing this, superseded by https://github.com/docling-project/docling/pull/1610