pdfjs
pdfjs copied to clipboard
Specify page dimension / orientation when merging external docs
Hello
When I merge a PDF file with different page size, pages that are not in the original orientation (paysage/portrait) are truncated.
Do you have a suggestion on how I could get the template page dimensions and use them for a specific page ?
Good point. It should be fairly easy to implement to always add pages from external documents in their original size.
In https://github.com/rkusa/pdfjs/blob/master/lib/external.js#L37
- if the page has no
MediaBox
property, set set the page'sMediaBox
property to theMediaBox
value of thethis.pages
object
However, I am now not sure what the default behavior should be and whether there should be an option for it (and if so, what to use as the default behavior). TBH, for my own use cases, I mostly set external PDFs as page page templates and I am expecting them to be truncated to the size of my document. But I see that this does not make much sense when appending external PDF pages...
My initial guess would be
- truncated for
setTemplate
- use original size for
addPageOf
What would be your opinion?
In my opinion, you should focus on the consistency of your API. You should stick with the current behaviour for all methods, and add an optional parameter to specify explicitly that the page format of the original page should be preserved. This is a minor patch, whereas changing the behaviour of the methods would be a breaking change that could impact other developer's code.
Besides that, I personnaly use setTemplate
in my code since I still want to add content on top of the imported page (footer, TOC targets, etc.) and want to preserve the aspect ratio. I cannot do that with addPageOf
which skips to the next page. Your proposition wouldn't work in my specific case.
Hi I'm curious to know if and what you decided to do on that topic ? ;-)
I did some testing and it seems most PDFs are already added in their respective size, and only some are not. It depends on whether the Page
itself specifies a MediaBox
or not. So I would consider it a bug, when pages added via addPageOf
have a wrong size. That is, I think we should
- [x] fix size of added external pages (via
addPageOf
) that have noMediaBox
specified - [ ] add an option to
setTemplate
to adopt the page size of the external document
Brilliant
This way feels very coherent with the current behaviour of the library
Pages added with addPageOf
should now (on master
) keep their original page size.
Though, I am not sure whether I am going to implement
add an option to
setTemplate
to adopt the page size of the external document
Because this leads to a lot of edge cases that have to be handled. E.g. what if the template size is so small that there is no space for content with the current document's padding setting.
How do you use setTemplate
? In my case, I always want to output A4 pages no matter the template size a user provides.
How does the library already handle the case of putting over sized content in under sized pages ? In other words, I think that it is not the responsibility of setTemplate
to handle the content that will be imported into the page after it successfully created the page from its input.
If you consider the case without setTemplate
: when you have too much content for a standard A4 page, or a font size so big that it doesn't fit into the page, how does the library handle it right now ? I guess it truncates the content or fails. The same will apply with a too small page from setTemplate
. Which is a coherent response of the system.
Generally speaking, since you will add the feature to adopt the page size of the external document as an option to setTemplate
, I think it is only natural to assume that the developer that activates this option handles its constraints properly. That is, the dev makes sure that the content fits into the page, or handles the failure when it's not the case.
In my own case, I build a PDF aggregator. The aggregator merges PDF files from a folders tree and can optionally : set a cover page from an external document, add a footer and/or page numbers, generate a TOC, generate a changelog based on the merged PDF files creation date.
Since I need to add content on each imported page (footer, page numbers, outline entries for the TOC and the changelog, etc.), I cannot use addPageOf
which skips to the next page after merging. I use setTemplate
instead.
My tool is mainly used in an administrative context, so I guess it will mostly work with A5-A4-A3 pages. However, I expect it to fail in the case of over sized content in under sized pages.
Hi @jpbourgeon,
in your case setTemplate
will "fail" (in terms of not giving the expected result) for both under sized and over sized content. Though I think setTemplate
is maybe not the best for your use case, since it only takes the first page of an document and also does not automatically insert page breaks. That is, you would have to manually call setTemplate
and a pageBreak
for each page in an extern document.
Your use case would require a mix of setTemplate
and addPagesOf
- Not sure yet, how the API could look like for that 🤔