readable-proxy
readable-proxy copied to clipboard
stdout maxBuffer exceeded
stdout maxBuffer exceeded
Sorry to bother you,
but I have no idea if that is coming form the proxy or the Readable.js lib,
or if I can raise that buffer limit somehow.
The website I tried: http://www.equestriadaily.com/2016/10/music-intersekt-twilight-says-bass-house.html
What can I do?
I am actually a python guy and have never touched node before.
So the solution would be to just higher that value? Is there a way to have that set dynamically or even remove any limit?
From search I figured you may mean this line, scrape.js:19
Why do we need to spawn a child process in the first place?
The question still stays: Why do we need to spawn a child process in the first place?
Edit:
- figured it opens
phantom-scrape.js- Why can't we import that file traditionally (
require)?
- Why can't we import that file traditionally (
phantom-scrape.js:- gets the url from
system.args[1] readabilityPath(the firefox js) issystem.args[2]- and a user agent at
system.args[3]
- gets the url from
To have a bit context, I am trying to let it run in a docker container, to use the api:
This is the Dockerfile file:
FROM node:latest
RUN apt-get update && apt-get install -y git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /app/proxy
RUN mkdir -p /app/lib
WORKDIR /app/proxy
ENV READABILITY_LIB_PATH /app/lib/Readability.js
ENV PORT 80
EXPOSE 80
RUN git clone https://github.com/n1k0/readable-proxy /app/proxy
RUN git clone https://github.com/mozilla/readability /app/lib
RUN npm install
CMD ["npm", "start"]
Is the subprocess stuff from initially being a CLI application, which was not changed to be importable by other projects?
Maybe @n1k0, can you tell me, why do we need to spawn a child process in the first place? Is that because it was a CLI app before, so kinda legacy code? Can we maybe just import (include) it directly instead of calling it via shell? See complete question above Kinda looking forward to get this api working :D
why do we need to spawn a child process in the first place?
Because we need to run a phantomjs script, which isnt based on node but on QtWebKit, which therefore can't share the same js execution runtime & event loop as the CLI node script.
The readable html should be less then original page?
So we could just use the length of the website +1024 as maxBuffer?
Would that work?
So, something like this. I have no idea how to put that into the program, because I never did work with that async approach...
var length = new Promise(function(fulfill, reject) {
http.get(url, function(res) {
res.on('data', function(d) {
fulfill(Buffer.byteLength(d, 'utf-8'));
});
});
});
});