readability Exceeded maxRedirects with nytimes.com links

(Just leaving this here, will investigate a bit later)

Given a New York Times URL such as this:

http://www.nytimes.com/2016/07/12/technology/pokemon-go-brings-augmented-reality-to-a-mass-audience.html

The request will fail with this error:

Error: Exceeded maxRedirects. Probably stuck in a redirect loop http://www.nytimes.com/2016/07/12/technology/pokemon-go-brings-augmented-reality-to-a-mass-audience.html?_r=4

Note that nytimes.com has some convoluted server configuration and returns a HTTP code of 303.

...you'll get the same redirection behavior with cURL:

$ curl -IL http://www.nytimes.com/2016/07/12/technology/pokemon-go-brings-augmented-reality-to-a-mass-audience.html

HTTP/1.1 303 See Other
Server: Varnish
location: https://myaccount.nytimes.com/auth/login?URI=http%3A%2F%2Fwww.nytimes.com%2F2016%2F07%2F12%2Ftechnology%2Fpokemon-go-brings-augmented-reality-to-a-mass-audience.html%3F_r%3D5&REFUSE_COOKIE_ERROR=SHOW_ERROR
Accept-Ranges: bytes
Date: Tue, 12 Jul 2016 12:12:38 GMT
Age: 0
X-API-Version: 5-0
X-PageType: article
Connection: close
X-Frame-Options: DENY
Set-Cookie: RMID=007f010123545784deb60008;Path=/; Domain=.nytimes.com;Expires=Wed, 12 Jul 2017 12:12:38 UTC

HTTP/1.1 200 OK
Date: Tue, 12 Jul 2016 12:12:41 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Set-Cookie: __cfduid=dce29bea6d432f3d2e44a8bbe3e1220aa1468325561; expires=Wed, 12-Jul-17 12:12:41 GMT; path=/; domain=.nytimes.com; HttpOnly
Vary: Accept-Encoding
Cache-Control: max-age=0, no-cache
Cneonction: close
Server: cloudflare-nginx
CF-RAY: 2c1467a827722507-ORD

More heavy HTTP clients, such as whatever wget uses by default, can deal with this, as can libraries such as Python's Requests. I'm new to Node so I'm not sure what the best-practices route is.

Jul 12 '16 12:07 dannguyen

Ah OK, now I remember what the hack for nytimes.com is: keep the cookies during the redirects. Not sure if setting this option to true is something that has implications for the general use case, so I leave it here FYI:

https://github.com/request/request#examples

request( {jar: true, url: url}, (err, resp, body) => { console.log(body) });

Jul 12 '16 12:07 dannguyen

+1 for this, having the same issue here

Oct 10 '16 01:10 felipe-augusto

+1 had the same issue for urls of washingtonpost.com. Fixed by setting jar:true

Mar 05 '18 10:03 VigneshPT

request( {jar: true, url: url}, (err, resp, body) => { console.log(body) });

Mar 26 '19 06:03 swathik313

Hi guys I was giving a try to Cheerio trying to scrap some info from: http://www.bna.com.ar/Personas

My Code: `const request = require('request'); const cheerio = require('cheerio');

request({jar:true, url:'http://www.bna.com.ar/Personas'},(err, res, html)=> { if (!err && res.statusCode == 200) {
console.log(html); } else console.log(err); });`

But is giving me the following error.

PS C:\Users\Federico\source\repos\Ws> node index (node:12196) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 pipe listeners added. Use emitter.setMaxListeners() to increase limit Error: Exceeded maxRedirects. Probably stuck in a redirect loop http://www.bna.com.ar/Error?aspxerrorpath=/Error/ErrorPage at Redirect.onResponse (C:\Users\Federico\source\repos\Ws\node_modules\request\lib\redirect.js:98:27) at Request.onRequestResponse (C:\Users\Federico\source\repos\Ws\node_modules\request\request.js:993:22) at ClientRequest.emit (events.js:198:13) at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:556:21) at HTTPParser.parserOnHeadersComplete (_http_common.js:109:17) at Socket.socketOnData (_http_client.js:442:20) at Socket.emit (events.js:198:13) at addChunk (_stream_readable.js:288:12) at readableAddChunk (_stream_readable.js:269:11) at Socket.Readable.push (_stream_readable.js:224:10)

I try adding jar:true but is not working any clue?

Jul 10 '19 02:07 nfederico

解决了我的问题 jar:true 点赞

Oct 15 '19 08:10 892042158

안녕, 영어를 쓰십시오

Oct 27 '20 08:10 emanton

readability readability copied to clipboard

Exceeded maxRedirects with nytimes.com links

readability
readability copied to clipboard