pdf2json
pdf2json copied to clipboard
Silent error on parse semi-transparent content
Hello, recently I found bug with parsing some .pdf files. If file have symbols colored in semi-transparent color, a program just stops without any error messages. Here's code which stops silent (declaration.pdf - "unparsable" document):
let fs = require('fs'),
PDFParser = require("pdf2json");
let pdfParser = new PDFParser(this, 1);
pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError) );
pdfParser.on("pdfParser_dataReady", pdfData => {
fs.writeFile("./content.txt", pdfParser.getRawTextContent());
});
pdfParser.loadPDF("./Declaration.pdf");
So, after I tried to parse document directly with pdf2json:
node pdf2json.js "../../documents/whitepaper/47/WP_En.pdf" "../../"
And I got some errors:
Warning: Unhandled rejection: Error: JPEG error: Unsupported color mode (4 components)
Error: JPEG error: Unsupported color mode (4 components)
at error (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:195:9)
at JpegStream_ensureBuffer [as ensureBuffer] (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:25571:7)
at JpegStream.DecodeStream_getBytes [as getBytes] (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:24875:14)
at PDFImage_getImageBytes [as getImageBytes] (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:21009:25)
at PDFImage_fillRgbaBuffer [as fillRgbaBuffer] (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:20941:27)
at PDFImage_getImageData [as getImageData] (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:21004:12)
at eval (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:7125:34)
at Object.eval [as onResolve] (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:20657:7)
at Object.runHandlers (eval at <anonymous> (app\node_modules\pdf2json\lib\pdf.js:64:1), <anonymous>:864:35)
at ontimeout (timers.js:466:11)
In a nutshell, my solution was to comment 2 lines(832-833) in /base/core/jpg.js:
if (!this.adobe)
throw 'Unsupported color mode (4 components)';
After that I successfully parsed my document.
So, @modesty, can you fix this or just remove?
++
++
can you upload the PDF?
This error still occurs, in the last version. Which non public method do you prefer, so that I send you a copy of a pdf file?
any update on this issue? I have the same problem