node-read
node-read copied to clipboard
Links that are not working correctly. [post them here]
If you find any links that node-read cannot correctly parse, please post them here.
http://bits.blogs.nytimes.com/2014/04/26/writing-in-a-nonstop-world
(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
at Request.EventEmitter.addListener (events.js:160:15)
at Request.self._buildRequest (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:366:10)
at Request.init (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:503:10)
at Request.onResponse (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:899:10)
at ClientRequest.g (events.js:180:16)
at ClientRequest.EventEmitter.emit (events.js:95:17)
at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
at Socket.socketOnData [as ondata] (http.js:1583:20)
at TCP.onread (net.js:527:27)
(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
at Request.EventEmitter.addListener (events.js:160:15)
at Request.start (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:700:8)
at Request.end (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:1319:28)
at ~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:418:14
at process._tickCallback (node.js:415:13)
~/Desktop/temp2/node_modules/node-read/index.js:78
parseDOM(buffer.toString("utf8"), res);
^
TypeError: Cannot call method 'toString' of undefined
at Request._callback (~/Desktop/temp2/node_modules/node-read/index.js:78:23)
at self.callback (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:121:22)
at Request.EventEmitter.emit (events.js:95:17)
at Request.onResponse (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:857:12)
at ClientRequest.g (events.js:180:16)
at ClientRequest.EventEmitter.emit (events.js:95:17)
at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
at Socket.socketOnData [as ondata] (http.js:1583:20)
at TCP.onread (net.js:527:27)
There's a bug with nytimes and request library. https://github.com/mikeal/request/issues/311#issuecomment-10237355 https://github.com/mikeal/request/issues/673 https://github.com/mikeal/request/issues/865
Not much I can do here.
Trailing white space in title from: http://www.howiechong.com/journal/2014/2/bike-helmets
Thanks! Fixed that.
First of all, thanks for doing this! The jsdom one was way too slow for my purposes and I've become increasingly frustrated by it. The performance improvements are off the charts!
A couple minor issues popped up when I migrated to your version:
The h1 tag is coming back inside a p tag. http://www.bbc.co.uk/sport/0/football/26060048
Also, In node-readability I believe these codes (& #8217; & #039; & quot; & #8220; & #8221;) were translated automatically. Up to you if you want to mimic that behaviour or not.
Thanks! I'll look into it.
Another issue I'm afraid...
http://www.bbc.co.uk/sport/0/football/29053651
The first sentence in the article is not included in the result.
Hello, When I try the code with following article, http://www.nbcnews.com/tech/mobile/facebook-lite-app-launched-tap-emerging-markets-n370481
I get this
has launched a stripped-down version of its app aimed at the huge potential user base in emerging markets.
The social media giant that it was testing the Android app in January, and was now rolling it out across countries in Asia, followed by parts of Latin America, Africa and Europe.
(and more...)Basically, its cleaning out anchor tags within main article.
http://tech.sina.com.cn/t/2016-04-20/doc-ifxrpvcy4243221.shtml
error when handleing this link