readable-proxy icon indicating copy to clipboard operation
readable-proxy copied to clipboard

stdout maxBuffer exceeded

Open luckydonald opened this issue 9 years ago • 8 comments

stdout maxBuffer exceeded Sorry to bother you, but I have no idea if that is coming form the proxy or the Readable.js lib, or if I can raise that buffer limit somehow. The website I tried: http://www.equestriadaily.com/2016/10/music-intersekt-twilight-says-bass-house.html

What can I do?

luckydonald avatar Oct 12 '16 01:10 luckydonald

We need to allow a specifying a maxBuffer option to execFile.

Would you want to work on a patch?

n1k0 avatar Oct 12 '16 06:10 n1k0

I am actually a python guy and have never touched node before.

So the solution would be to just higher that value? Is there a way to have that set dynamically or even remove any limit?

From search I figured you may mean this line, scrape.js:19

Why do we need to spawn a child process in the first place?

luckydonald avatar Oct 19 '16 14:10 luckydonald

The question still stays: Why do we need to spawn a child process in the first place?

Edit:

  • figured it opens phantom-scrape.js
    • Why can't we import that file traditionally (require)?
  • phantom-scrape.js:
    • gets the url from system.args[1]
    • readabilityPath (the firefox js) is system.args[2]
    • and a user agent at system.args[3]

luckydonald avatar Oct 19 '16 14:10 luckydonald

To have a bit context, I am trying to let it run in a docker container, to use the api:

This is the Dockerfile file:

FROM node:latest


RUN apt-get update && apt-get install -y git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN mkdir -p /app/proxy
RUN mkdir -p /app/lib
WORKDIR /app/proxy


ENV READABILITY_LIB_PATH /app/lib/Readability.js
ENV PORT 80
EXPOSE 80

RUN git clone https://github.com/n1k0/readable-proxy /app/proxy
RUN git clone https://github.com/mozilla/readability /app/lib

RUN npm install

CMD ["npm", "start"]

Is the subprocess stuff from initially being a CLI application, which was not changed to be importable by other projects?

luckydonald avatar Oct 19 '16 14:10 luckydonald

Maybe @n1k0, can you tell me, why do we need to spawn a child process in the first place? Is that because it was a CLI app before, so kinda legacy code? Can we maybe just import (include) it directly instead of calling it via shell? See complete question above Kinda looking forward to get this api working :D

luckydonald avatar Oct 24 '16 10:10 luckydonald

why do we need to spawn a child process in the first place?

Because we need to run a phantomjs script, which isnt based on node but on QtWebKit, which therefore can't share the same js execution runtime & event loop as the CLI node script.

n1k0 avatar Oct 24 '16 20:10 n1k0

The readable html should be less then original page? So we could just use the length of the website +1024 as maxBuffer? Would that work?

luckydonald avatar Oct 24 '16 21:10 luckydonald

So, something like this. I have no idea how to put that into the program, because I never did work with that async approach...

var length = new Promise(function(fulfill, reject) {
      http.get(url, function(res) {
        res.on('data', function(d) {
          fulfill(Buffer.byteLength(d, 'utf-8'));
        });
      });
    });
});

luckydonald avatar Oct 24 '16 23:10 luckydonald