mammoth.js Pagination of big docx files

Pagination of big docx files

Open stalniy opened this issue 6 years ago • 4 comments

It would be good to have a possibility to convert big docx file by chunks (by few pages).

Nov 06 '18 08:11 stalniy

+1 For now, there is no page info return by convertToHtml func.

Nov 10 '18 08:11 kennylbj

I too have a need to treat each page of a Word document as an HTML page.

After reading your code, would this be solved by a style rule of

"br[type='page'] => div.page:fresh"

and then split the output with

<div class="page"></div>

or whatever element you choose.

It would need an option like ignorePageBreak to change the value in docx/body-reader.js/ignoreElements. Of course,, it may be more complicated that.

Aug 18 '20 18:08 pboysen

Defo this is needed for my team :+1:

Mar 14 '23 17:03 theZappr

This would be extremely useful for our team, where having the page number metadata will be very helpful for GPT to parse our documents properly

Apr 24 '23 21:04 motivatedclay