pdfjs icon indicating copy to clipboard operation
pdfjs copied to clipboard

Specify page dimension / orientation when merging external docs

Open jpbourgeon opened this issue 5 years ago • 8 comments

Hello

When I merge a PDF file with different page size, pages that are not in the original orientation (paysage/portrait) are truncated.

Do you have a suggestion on how I could get the template page dimensions and use them for a specific page ?

jpbourgeon avatar Sep 05 '18 15:09 jpbourgeon

Good point. It should be fairly easy to implement to always add pages from external documents in their original size.

In https://github.com/rkusa/pdfjs/blob/master/lib/external.js#L37

  • if the page has no MediaBox property, set set the page's MediaBox property to the MediaBox value of the this.pages object

However, I am now not sure what the default behavior should be and whether there should be an option for it (and if so, what to use as the default behavior). TBH, for my own use cases, I mostly set external PDFs as page page templates and I am expecting them to be truncated to the size of my document. But I see that this does not make much sense when appending external PDF pages...

My initial guess would be

  • truncated for setTemplate
  • use original size for addPageOf

What would be your opinion?

rkusa avatar Sep 07 '18 17:09 rkusa

In my opinion, you should focus on the consistency of your API. You should stick with the current behaviour for all methods, and add an optional parameter to specify explicitly that the page format of the original page should be preserved. This is a minor patch, whereas changing the behaviour of the methods would be a breaking change that could impact other developer's code.

Besides that, I personnaly use setTemplate in my code since I still want to add content on top of the imported page (footer, TOC targets, etc.) and want to preserve the aspect ratio. I cannot do that with addPageOf which skips to the next page. Your proposition wouldn't work in my specific case.

jpbourgeon avatar Sep 10 '18 08:09 jpbourgeon

Hi I'm curious to know if and what you decided to do on that topic ? ;-)

jpbourgeon avatar Sep 20 '18 13:09 jpbourgeon

I did some testing and it seems most PDFs are already added in their respective size, and only some are not. It depends on whether the Page itself specifies a MediaBox or not. So I would consider it a bug, when pages added via addPageOf have a wrong size. That is, I think we should

  • [x] fix size of added external pages (via addPageOf) that have no MediaBox specified
  • [ ] add an option to setTemplate to adopt the page size of the external document

rkusa avatar Sep 22 '18 16:09 rkusa

Brilliant

This way feels very coherent with the current behaviour of the library

jpbourgeon avatar Sep 22 '18 18:09 jpbourgeon

Pages added with addPageOf should now (on master) keep their original page size.

Though, I am not sure whether I am going to implement

add an option to setTemplate to adopt the page size of the external document

Because this leads to a lot of edge cases that have to be handled. E.g. what if the template size is so small that there is no space for content with the current document's padding setting.

How do you use setTemplate? In my case, I always want to output A4 pages no matter the template size a user provides.

rkusa avatar Sep 24 '18 05:09 rkusa

How does the library already handle the case of putting over sized content in under sized pages ? In other words, I think that it is not the responsibility of setTemplate to handle the content that will be imported into the page after it successfully created the page from its input.

If you consider the case without setTemplate: when you have too much content for a standard A4 page, or a font size so big that it doesn't fit into the page, how does the library handle it right now ? I guess it truncates the content or fails. The same will apply with a too small page from setTemplate. Which is a coherent response of the system.

Generally speaking, since you will add the feature to adopt the page size of the external document as an option to setTemplate, I think it is only natural to assume that the developer that activates this option handles its constraints properly. That is, the dev makes sure that the content fits into the page, or handles the failure when it's not the case.


In my own case, I build a PDF aggregator. The aggregator merges PDF files from a folders tree and can optionally : set a cover page from an external document, add a footer and/or page numbers, generate a TOC, generate a changelog based on the merged PDF files creation date.

Since I need to add content on each imported page (footer, page numbers, outline entries for the TOC and the changelog, etc.), I cannot use addPageOf which skips to the next page after merging. I use setTemplate instead.

My tool is mainly used in an administrative context, so I guess it will mostly work with A5-A4-A3 pages. However, I expect it to fail in the case of over sized content in under sized pages.

jpbourgeon avatar Oct 03 '18 04:10 jpbourgeon

Hi @jpbourgeon,

in your case setTemplate will "fail" (in terms of not giving the expected result) for both under sized and over sized content. Though I think setTemplate is maybe not the best for your use case, since it only takes the first page of an document and also does not automatically insert page breaks. That is, you would have to manually call setTemplate and a pageBreak for each page in an extern document.

Your use case would require a mix of setTemplate and addPagesOf - Not sure yet, how the API could look like for that 🤔

rkusa avatar Oct 08 '18 07:10 rkusa