pdf2json
pdf2json copied to clipboard
Loading pdfs from a remote server via a stream
I'm trying to load a single PDF from a remote server. Here is my approach: (I can confirm that if I just pipe the request into a write stream it saves the PDF fine)
var request = require('request');
var pdfParser = require('pdf2json');
var pdfUrl = 'somepdf.pdf'
var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);
pdfPipe.on("pdfParser_dataError", err => console.error(err) );
pdfPipe.on("pdfParser_dataReady", pdf => {
//let pdf = pdfParser.getMergedTextBlocksIfNeeded();
console.log(pdfParser.getAllFieldsTypes());
});
However, I'm getting an error:
stream.js:45
dest.on('drain', ondrain);
^
TypeError: dest.on is not a function
at Request.Stream.pipe (stream.js:45:8)
at Request.pipe (/Users/zaf/development/minerva-bot/node_modules/request/request.js:1395:34)
at Object.<anonymous> (/Users/zaf/development/minerva-bot/plugins/exam_module/index.js:9:53)
at Module._compile (module.js:434:26)
at Object.Module._extensions..js (module.js:452:10)
at Module.load (module.js:355:32)
at Function.Module._load (module.js:310:12)
at Function.Module.runMain (module.js:475:10)
at startup (node.js:117:18)
at node.js:951:3
Code constructed from here: http://stackoverflow.com/a/36882510/3779915
try: var PDFParser = require("./pdf2json/PDFParser"); var pdfParser = new PDFParser(); var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);
try: var PDFParser = require("./pdf2json/PDFParser"); var pdfParser = new PDFParser(); var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);
I get an empty array i am trying to fetch a pdf from a remote server any idea what could cause that
We will need to use request({url: pdfUrl, encoding:null}).pipe(pdfParser.createParserStream());
now