docling icon indicating copy to clipboard operation
docling copied to clipboard

feat: use `w:lastRenderedPageBreak` to get approximate pagination from docx

Open dhdaines opened this issue 11 months ago • 2 comments

Sometimes (very far from always) we can get pagination from a Word document using the w:lastRenderedPageBreak element (https://ooxml.info/docs/17/17.3/17.3.3/17.3.3.13/). This supports that. Note that the resulting pagination is very approximate, and the provenance ~and page~ objects created are known to be false.

  • [x] Documentation has been updated, if necessary.
  • [x] Examples have been added, if necessary.
  • [x] Tests have been added, if necessary.

dhdaines avatar Jan 29 '25 13:01 dhdaines

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • [ ] #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • [X] title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

mergify[bot] avatar Jan 29 '25 13:01 mergify[bot]

Rebased / fixed signoff / force-pushed!

dhdaines avatar Feb 03 '25 17:02 dhdaines

hello! is anything still required for this to get a review?

dhdaines avatar Mar 12 '25 15:03 dhdaines

@dhdaines Thanks for this contribution. We would prefer not to put approximate information on pagination in the output document, but rather go with unpaginated document if there is no reliable source of information. I will close this PR.

cau-git avatar Mar 31 '25 09:03 cau-git