node-read icon indicating copy to clipboard operation
node-read copied to clipboard

Links that are not working correctly. [post them here]

Open bndr opened this issue 10 years ago • 10 comments

If you find any links that node-read cannot correctly parse, please post them here.

bndr avatar Apr 27 '14 19:04 bndr

http://bits.blogs.nytimes.com/2014/04/26/writing-in-a-nonstop-world

(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Request.EventEmitter.addListener (events.js:160:15)
    at Request.self._buildRequest (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:366:10)
    at Request.init (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:503:10)
    at Request.onResponse (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:899:10)
    at ClientRequest.g (events.js:180:16)
    at ClientRequest.EventEmitter.emit (events.js:95:17)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
    at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
    at Socket.socketOnData [as ondata] (http.js:1583:20)
    at TCP.onread (net.js:527:27)
(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Request.EventEmitter.addListener (events.js:160:15)
    at Request.start (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:700:8)
    at Request.end (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:1319:28)
    at ~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:418:14
    at process._tickCallback (node.js:415:13)

~/Desktop/temp2/node_modules/node-read/index.js:78
      parseDOM(buffer.toString("utf8"), res);
                      ^
TypeError: Cannot call method 'toString' of undefined
    at Request._callback (~/Desktop/temp2/node_modules/node-read/index.js:78:23)
    at self.callback (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:121:22)
    at Request.EventEmitter.emit (events.js:95:17)
    at Request.onResponse (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:857:12)
    at ClientRequest.g (events.js:180:16)
    at ClientRequest.EventEmitter.emit (events.js:95:17)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
    at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
    at Socket.socketOnData [as ondata] (http.js:1583:20)
    at TCP.onread (net.js:527:27)

scheeser avatar Apr 28 '14 15:04 scheeser

There's a bug with nytimes and request library. https://github.com/mikeal/request/issues/311#issuecomment-10237355 https://github.com/mikeal/request/issues/673 https://github.com/mikeal/request/issues/865

Not much I can do here.

bndr avatar Apr 28 '14 17:04 bndr

Trailing white space in title from: http://www.howiechong.com/journal/2014/2/bike-helmets

simonccarter avatar May 15 '14 10:05 simonccarter

Thanks! Fixed that.

bndr avatar May 15 '14 18:05 bndr

First of all, thanks for doing this! The jsdom one was way too slow for my purposes and I've become increasingly frustrated by it. The performance improvements are off the charts!

A couple minor issues popped up when I migrated to your version:

The h1 tag is coming back inside a p tag. http://www.bbc.co.uk/sport/0/football/26060048

Also, In node-readability I believe these codes (& #8217; & #039; & quot; & #8220; & #8221;) were translated automatically. Up to you if you want to mimic that behaviour or not.

midknight41 avatar Aug 31 '14 09:08 midknight41

Thanks! I'll look into it.

bndr avatar Aug 31 '14 10:08 bndr

Another issue I'm afraid...

http://www.bbc.co.uk/sport/0/football/29053651

The first sentence in the article is not included in the result.

midknight41 avatar Sep 04 '14 10:09 midknight41

Hello, When I try the code with following article, http://www.nbcnews.com/tech/mobile/facebook-lite-app-launched-tap-emerging-markets-n370481

I get this

has launched a stripped-down version of its app aimed at the huge potential user base in emerging markets.

The social media giant that it was testing the Android app in January, and was now rolling it out across countries in Asia, followed by parts of Latin America, Africa and Europe.

(and more...)

Basically, its cleaning out anchor tags within main article.

kannanth avatar Jun 05 '15 15:06 kannanth

A simple tag is not showing links when getting through article.content:

Ministério das Comunicações

rodrigocprates avatar Jun 08 '15 04:06 rodrigocprates

http://tech.sina.com.cn/t/2016-04-20/doc-ifxrpvcy4243221.shtml

error when handleing this link

neurosurgeonX avatar Apr 21 '16 15:04 neurosurgeonX