mammoth.js icon indicating copy to clipboard operation
mammoth.js copied to clipboard

Pagination of big docx files

Open stalniy opened this issue 6 years ago • 4 comments

It would be good to have a possibility to convert big docx file by chunks (by few pages).

stalniy avatar Nov 06 '18 08:11 stalniy

+1 For now, there is no page info return by convertToHtml func.

kennylbj avatar Nov 10 '18 08:11 kennylbj

I too have a need to treat each page of a Word document as an HTML page.

After reading your code, would this be solved by a style rule of

"br[type='page'] => div.page:fresh"

and then split the output with

<div class="page"></div>

or whatever element you choose.

It would need an option like ignorePageBreak to change the value in docx/body-reader.js/ignoreElements. Of course,, it may be more complicated that.

pboysen avatar Aug 18 '20 18:08 pboysen

Defo this is needed for my team :+1:

theZappr avatar Mar 14 '23 17:03 theZappr

This would be extremely useful for our team, where having the page number metadata will be very helpful for GPT to parse our documents properly

motivatedclay avatar Apr 24 '23 21:04 motivatedclay