mammoth.js
mammoth.js copied to clipboard
Pagination of big docx files
It would be good to have a possibility to convert big docx file by chunks (by few pages).
+1
For now, there is no page info return by convertToHtml
func.
I too have a need to treat each page of a Word document as an HTML page.
After reading your code, would this be solved by a style rule of
"br[type='page'] => div.page:fresh"
and then split the output with
<div class="page"></div>
or whatever element you choose.
It would need an option like ignorePageBreak to change the value in docx/body-reader.js/ignoreElements. Of course,, it may be more complicated that.
Defo this is needed for my team :+1:
This would be extremely useful for our team, where having the page number metadata will be very helpful for GPT to parse our documents properly