vitepress-export-pdf
vitepress-export-pdf copied to clipboard
Merge PDF page number error
Question:
Page number is not merge, Is there any way to combine the page numbers or customize the page numbers, Thank you very much
Code:
const footerTemplate = `<div style="margin-bottom: -0.4cm; height: 70%; width: 100%; display: flex; justify-content: space-between; align-items: center; color: lightgray; border-top: solid lightgray 1px; font-size: 10px;">
<span style="margin-left: 15px;" class="url"></span><span style="margin-right: 15px;"><span class="pageNumber"></span>/<span class="totalPages"></span></span
</div>`;
The PDF file format is all about producing the desired visual result for printing. It was not created for parsing the content. PDF files don’t contain a semantic layer.
Specifically, there is no information what the header, footer, page numbers, tables, and paragraphs are. The visual appearence is there and people might find heuristics to make educated guesses, but there is no way of being certain.
This is a shortcoming of the PDF file format. https://pypdf.readthedocs.io/en/stable/user/extract-text.html#missing-semantic-layer
The description language in PDF format is very similar to HTML, with the only drawback being its lack of semantics,It describes the content of PDF pages through objects, such as the following example:
3 0 obj
<< /Filter /FlateDecode /Length 191 >>
stream
x]��
�@��}�9������& �<
>�VD�J���7�QrHf�/��`��xS0ؑa����uO���g�{��
���H��&a���#O8"�`:E��W]7�a����}i |e*)��c6���P� 6H�4[(P�������a�
�bAoë�6�c��G�NMJWܯ�t#���
�\+�h�>>
endstream
endobj
1 0 obj
<< /Type /Page /Parent 2 0 R /Resources 4 0 R /Contents 3 0 R /MediaBox [0 0 595.28 841.89]
>>
endobj
4 0 obj
<< /ProcSet [ /PDF /Text ] /ColorSpace << /Cs1 5 0 R >> /Font << /TT1 6 0 R
>> >>
endobj
So many PDF parsing libraries cannot extract page numbers, and I cannot modify page numbers when merging PDFs. I have been thinking for a long time without a solution.
However, there is an imperfect solution, which is to turn off page numbers when generating PDF, but leave room for page numbers and add them yourself. Here is an example: https://github.com/condorheroblog/vitepress-export-pdf/commit/d26383d09313ebd3a009cee110429ada1aaed1d4#diff-79cab662fb8d5527d226a743033ffdfd879fcb65489faa6eabe35ca25a7906d5
import { readFileSync, writeFileSync } from "node:fs";
import { PDFDocument, StandardFonts, rgb } from "pdf-lib";
const existingPdfBytes = readFileSync("./vitepress.dev.pdf");
const pdfDoc = await PDFDocument.load(existingPdfBytes);
const helveticaFont = await pdfDoc.embedFont(StandardFonts.Helvetica);
const pages = pdfDoc.getPages();
const totalPages = pages.length;
for (let i = 0; i < totalPages; i++) {
const page = pages[i];
const { width } = page.getSize();
const text = `${i + 1} / ${totalPages}`;
const fontSize = 9;
const textX = width - 50;
const textY = fontSize;
page.drawText(text, {
x: textX,
y: textY + 5,
size: fontSize,
font: helveticaFont,
color: rgb(127 / 256, 127 / 256, 127 / 256),
});
}
const pdfBytes = await pdfDoc.save();
writeFileSync("pagination.pdf", pdfBytes);
It's not perfect, but it's good.
I know that Cairo at least supports page labels. Perhaps pdf-lib does as well?