readability
readability copied to clipboard
Exceeded maxRedirects with nytimes.com links
(Just leaving this here, will investigate a bit later)
Given a New York Times URL such as this:
http://www.nytimes.com/2016/07/12/technology/pokemon-go-brings-augmented-reality-to-a-mass-audience.html
The request will fail with this error:
Error: Exceeded maxRedirects. Probably stuck in a redirect loop http://www.nytimes.com/2016/07/12/technology/pokemon-go-brings-augmented-reality-to-a-mass-audience.html?_r=4
Note that nytimes.com has some convoluted server configuration and returns a HTTP code of 303.
...you'll get the same redirection behavior with cURL:
$ curl -IL http://www.nytimes.com/2016/07/12/technology/pokemon-go-brings-augmented-reality-to-a-mass-audience.html
HTTP/1.1 303 See Other
Server: Varnish
location: https://myaccount.nytimes.com/auth/login?URI=http%3A%2F%2Fwww.nytimes.com%2F2016%2F07%2F12%2Ftechnology%2Fpokemon-go-brings-augmented-reality-to-a-mass-audience.html%3F_r%3D5&REFUSE_COOKIE_ERROR=SHOW_ERROR
Accept-Ranges: bytes
Date: Tue, 12 Jul 2016 12:12:38 GMT
Age: 0
X-API-Version: 5-0
X-PageType: article
Connection: close
X-Frame-Options: DENY
Set-Cookie: RMID=007f010123545784deb60008;Path=/; Domain=.nytimes.com;Expires=Wed, 12 Jul 2017 12:12:38 UTC
HTTP/1.1 200 OK
Date: Tue, 12 Jul 2016 12:12:41 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Set-Cookie: __cfduid=dce29bea6d432f3d2e44a8bbe3e1220aa1468325561; expires=Wed, 12-Jul-17 12:12:41 GMT; path=/; domain=.nytimes.com; HttpOnly
Vary: Accept-Encoding
Cache-Control: max-age=0, no-cache
Cneonction: close
Server: cloudflare-nginx
CF-RAY: 2c1467a827722507-ORD
More heavy HTTP clients, such as whatever wget uses by default, can deal with this, as can libraries such as Python's Requests. I'm new to Node so I'm not sure what the best-practices route is.
Ah OK, now I remember what the hack for nytimes.com is: keep the cookies during the redirects. Not sure if setting this option to true is something that has implications for the general use case, so I leave it here FYI:
https://github.com/request/request#examples
request( {jar: true, url: url}, (err, resp, body) => { console.log(body) });
+1 for this, having the same issue here
+1 had the same issue for urls of washingtonpost.com. Fixed by setting jar:true
request( {jar: true, url: url}, (err, resp, body) => { console.log(body) });
Hi guys I was giving a try to Cheerio trying to scrap some info from: http://www.bna.com.ar/Personas
My Code: `const request = require('request'); const cheerio = require('cheerio');
request({jar:true, url:'http://www.bna.com.ar/Personas'},(err, res, html)=> {
if (!err && res.statusCode == 200) {
console.log(html); }
else console.log(err);
});`
But is giving me the following error.
PS C:\Users\Federico\source\repos\Ws> node index (node:12196) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 pipe listeners added. Use emitter.setMaxListeners() to increase limit Error: Exceeded maxRedirects. Probably stuck in a redirect loop http://www.bna.com.ar/Error?aspxerrorpath=/Error/ErrorPage at Redirect.onResponse (C:\Users\Federico\source\repos\Ws\node_modules\request\lib\redirect.js:98:27) at Request.onRequestResponse (C:\Users\Federico\source\repos\Ws\node_modules\request\request.js:993:22) at ClientRequest.emit (events.js:198:13) at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:556:21) at HTTPParser.parserOnHeadersComplete (_http_common.js:109:17) at Socket.socketOnData (_http_client.js:442:20) at Socket.emit (events.js:198:13) at addChunk (_stream_readable.js:288:12) at readableAddChunk (_stream_readable.js:269:11) at Socket.Readable.push (_stream_readable.js:224:10)
I try adding jar:true but is not working any clue?
解决了我的问题 jar:true 点赞
안녕, 영어를 쓰십시오