pdfjs how can i get a pdf file’s total number of pages？

how can i get a pdf file’s total number of pages？

Open BigWolf286 opened this issue 4 years ago • 5 comments

how can i get a pdf file’s total number of pages？

Nov 21 '19 03:11 BigWolf286

See https://github.com/rkusa/pdfjs/blob/master/docs/header.md e.g.

const header = doc.header()
header.pageNumber((curr, total) => `${curr} / ${total}`)

Is this what you are looking for?

Nov 21 '19 13:11 rkusa

thanks very much , I'm look for this :
const ext = new pdf.ExternalDocument(src);
let pageCount=ext.pageCount;

I have another question : some pdf's format does not appear to be in specification, when I split them ,there are some error:
EOL expected but not found or
Error: Invalid name at Function.parse (D:\project\xhgc.tool…b\object\name.js:56) at Function.parse (D:\project\xhgc.tool…ct\dictionary.js:71) at Object.exports.parse (D:\project\xhgc.tool…\object\value.js:20) at Function.parseInner (D:\project\xhgc.tool…object\object.js:79) at Function.parse (D:\project\xhgc.tool…object\object.js:67)

code as follow: const doc = new pdf.Document({}); const src = fs.readFileSync(pdfPath); const ext = new pdf.ExternalDocument(src); pdf as follow:

MQTT-3.1.1-CN.pdf

do you have any good idea ？比心

Nov 22 '19 07:11 BigWolf286

thanks very much , I'm look for this : const ext = new pdf.ExternalDocument(src); let pageCount=ext.pageCount;

For external documents, what you are doing should work, does not work for you?

I have another question : some pdf's format does not appear to be in specification, when I split them ,there are some error: EOL expected but not found or Error: Invalid name at Function.parse (D:\project\xhgc.tool…b\object\name.js:56) at Function.parse (D:\project\xhgc.tool…ct\dictionary.js:71) at Object.exports.parse (D:\project\xhgc.tool…\object\value.js:20) at Function.parseInner (D:\project\xhgc.tool…object\object.js:79) at Function.parse (D:\project\xhgc.tool…object\object.js:67)

I finally merge https://github.com/rkusa/pdfjs/pull/142, so this issue might be fixed on master.

Nov 25 '19 10:11 rkusa

thanks very much , I'm look for this : const ext = new pdf.ExternalDocument(src); let pageCount=ext.pageCount;

For external documents, what you are doing should work, does not work for you?
It is ok ，no question

I have another question : some pdf's format does not appear to be in specification, when I split them ,there are some error: EOL expected but not found or Error: Invalid name at Function.parse (D:\project\xhgc.tool…b\object\name.js:56) at Function.parse (D:\project\xhgc.tool…ct\dictionary.js:71) at Object.exports.parse (D:\project\xhgc.tool…\object\value.js:20) at Function.parseInner (D:\project\xhgc.tool…object\object.js:79) at Function.parse (D:\project\xhgc.tool…object\object.js:67)

I finally merge #142, so this issue might be fixed on master.

I got it, thank you new question TypeError: Cannot read property 'compressed' of undefined pdf as follow 4.样张-priview.pdf

Nov 26 '19 06:11 BigWolf286

I've checked your PDF and the error occurs because the PDF contains a lot of object references to objects that do not exist in the document and that are not mentioned in the cross reference table.

It also looks like your PDF was incrementally updated, for such documents, the spec says:

The cross-reference section added when a file is updated contains entries only for objects that have been changed, replaced, or deleted. Deleted objects are left unchanged in the file, but are marked as deleted by means of their cross-reference entries. The added trailer contains all the entries (perhaps modified) from the previous trailer, as well as a Prev entry giving the location of the previous cross- reference section (see Table 3.13 on page 97).

... and especially this part

The added trailer contains all the entries (perhaps modified) from the previous trailer

... does not seem to be the case for your document. While it looks to me that your PDF document is invalid, it could also be the case that I have missed the part of the spec that describes the behaviour of your document. Anyway, I am not sure how to proceed here, I probably cannot invest more time myself digging into the PDF spec, but I'd be open to investigate further if someone else can explain why the document is referencing object e.g. 64 a lot, even though the new XRef object starts at an index of 66.

Dec 03 '19 08:12 rkusa

pdfjs pdfjs copied to clipboard

how can i get a pdf file’s total number of pages？

pdfjs
pdfjs copied to clipboard