unpdf
unpdf copied to clipboard
Strange behavior of `getDocumentProxy`'s buffer when extracting text AND rendering page as image (only for some pdf)
Environment
node v20.11.1 unpdf v0.11.0
Reproduction
I got the original error in a server route of a Nuxt 3 project. Also, in the original app I performed other operations besides text/metadata extraction and image rendering.
Anyway, I prepared a new Nitro project for this issue and isolated only the error involved. You can find the repo here: https://github.com/ndrbrt/unpdf-issue
Describe the bug
First of all, I noticed the issue only for some pdfs (actually pdfs with images, but I don't know if it's something comparable to #4, nor if it only affects pdfs with images).
Error A
The original code was similar to that in server/api/error-a.ts.
If you run the dev server and open, e.g.:
- http://localhost:3000/api/error-a?url=https://github.com/raphink/geneve_1564/releases/download/2015-07-08_01/geneve_1564.pdf
You get the following error:
[nitro] [request error] [unhandled] Cannot read properties of undefined (reading 'createCanvas')
at i.constructor._createCanvas (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1552904)
at i.constructor.create (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1399305)
at CachedCanvases.getCanvas (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1474861)
at CanvasGraphics.beginGroup (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1502437)
at CanvasGraphics.executeOperatorList (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1482511)
at InternalRenderTask._next (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1591245)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
However, as I said, if you pass some other pdfs, everything's fine, e.g.:
- http://localhost:3000/api/error-a?url=https://bitcoin.org/bitcoin.pdf
Working version
Now, the only way I was able to solve the problem is as in server/api/working.ts: I copied the original buffer before it was passed to getDocumentProxy and then passed the copied buffer to renderPageAsImage. You can see that both requests succeed:
- http://localhost:3000/api/working?url=https://bitcoin.org/bitcoin.pdf
- http://localhost:3000/api/working?url=https://github.com/raphink/geneve_1564/releases/download/2015-07-08_01/geneve_1564.pdf
Error B
I also tried another approach in server/api/error-b.ts, passing a new Uint8Array(buffer) directly to renderPageAsImage. This way, if you open:
- http://localhost:3000/api/error-b?url=https://github.com/raphink/geneve_1564/releases/download/2015-07-08_01/geneve_1564.pdf
You get this error:
[nitro] [request error] [unhandled] Unable to deserialize cloned data.
at LoopbackPort.postMessage (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1573782)
at MessageHandler.sendWithPromise (./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1514035)
at ./node_modules/.pnpm/[email protected]/node_modules/unpdf/dist/pdfjs.mjs:1:1561726
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Interestingly, in this case, if you repeat the request disabling text extraction (note the query param), it works:
- http://localhost:3000/api/error-b?url=https://github.com/raphink/geneve_1564/releases/download/2015-07-08_01/geneve_1564.pdf&text=false
Additional context
I did not use the official PDF.js build, because I couldn't get it to work. I still tried using the default build from unpdf and everything worked fine until I noticed the mentioned problem.
Logs
No response
Hi there! Thanks for the thourough issue description. One question: How did you deploy the app? Canvas support is only possible in Node deploy targets.
Hi @johannschopplich, I deployed the app on Vercel using the default config as in https://nuxt.com/deploy/vercel. (It works the same way both on Vercel and locally)
I see. It's probably not gonna work on Vercel, since the canvas module requires Node.js bindings.
For your other examples: Please use the official PDF.js build, because the serverless build (used by unpdf by default) has stripped the canvas support.
Can you please follow the renderPageAsImage guide to set up the pdfjs-dist build used together with canvas?
import { configureUnPDF, renderPageAsImage } from "unpdf";
await configureUnPDF({
// Use the official PDF.js build
pdfjs: () => import("pdfjs-dist"),
});
const result = await renderPageAsImage(pdf, 1, {
canvas: () => import("canvas"),
});
Actually I did try to use pdfjs-dist, but it resulted in an error.
yarn add pdfjs-dist
await configureUnPDF({
// Use the official PDF.js build
pdfjs: async () => await import('pdfjs-dist'),
})
ERROR [nuxt] [request error] [unhandled] [500] Resolving failed. Please check the provided configuration.
at resolvePDFJSImports (./node_modules/unpdf/dist/index.mjs:33:13)
at async configureUnPDF (./node_modules/unpdf/dist/index.mjs:179:5)
at Object.handler (./server/api/test.ts:5:1)
at async ./node_modules/h3/dist/index.mjs:1975:19
at async Object.callAsync (./node_modules/unctx/dist/index.mjs:72:16)
at async Server.toNodeHandle (./node_modules/h3/dist/index.mjs:2266:7)