httparty
httparty copied to clipboard
Fetching a feed with redirects not working
We currently have a problem where a feed we want to fetch finally resolves to 127.0.0.1:80
.
The address of the feed is http://www.jennstrends.com/feed which redirects to http://www.jennstrends.com/feed/. In a normal browser everything works.
I've modified HTTParty::Request#perform
to this:
def perform(&block)
validate
setup_raw_request
chunked_body = nil
self.last_response = http.request(@raw_request) do |http_response|
if block
chunks = []
http_response.read_body do |fragment|
chunks << fragment unless options[:stream_body]
block.call(fragment)
end
chunked_body = chunks.join
end
end
handle_deflation unless http_method == Net::HTTP::Head
puts "Response status: #{last_response.code}"
puts "Response redirects: #{response_redirects?}"
puts "Redirect location: #{last_response['location']}"
puts ""
handle_host_redirection if response_redirects?
handle_response(chunked_body, &block)
end
In HTTParty following happens
[1] pry(main)> HTTParty.get('http://www.jennstrends.com/feed')
Response status: 301
Response redirects: true
Redirect location: http://www.jennstrends.com/feed/
Response status: 302
Response redirects: true
Redirect location: http://127.0.0.1
Response status: 200
Response redirects:
Redirect location:
=> "<!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\n body {\n width: 35em;\n margin: 0 auto;\n font-family: Tahoma, Verdana, Arial, sans-serif;\n }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n"
I know this is a misconfiguration of the site providing the feed but for example chrome does not redirect me to localhost and displays the feed properly. This leads me to the question how can chrome do the "right" thing although the site is obviously not configured properly.
Feel free to close if there is nothing we can do here, I will contact the page owner and write them that there is probably a misconfiguration in their system.
Hmm. curl seems to do the "right" thing too.
$ curl -ISL http://www.jennstrends.com/feed
HTTP/1.1 301 Moved Permanently
...
Location: http://www.jennstrends.com/feed/
...
HTTP/1.1 200 OK
...
I used the curl formatter for httparty logging, which basically gives you the debugging above for free:
require "logger"
> HTTParty.get('http://www.jennstrends.com/feed', logger: Logger.new(STDOUT), log_format: :curl)
I, [2016-11-18T21:59:25.745820 #92246] INFO -- : [HTTParty] [2016-11-18 21:59:25 -0500] > GET http://www.jennstrends.com/feed
[HTTParty] [2016-11-18 21:59:25 -0500] >
[HTTParty] [2016-11-18 21:59:25 -0500] < HTTP/1.1 301
...
[HTTParty] [2016-11-18 21:59:25 -0500] < Location: http://www.jennstrends.com/feed/
...
[HTTParty] [2016-11-18 21:59:25 -0500] <
I, [2016-11-18T21:59:25.830156 #92246] INFO -- : [HTTParty] [2016-11-18 21:59:25 -0500] > GET http://www.jennstrends.com/feed/
[HTTParty] [2016-11-18 21:59:25 -0500] > Headers:
...
[HTTParty] [2016-11-18 21:59:25 -0500] < Location: http://127.0.0.1
...
[HTTParty] [2016-11-18 21:59:25 -0500] <
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://127.0.0.1">here</a>.</p>
</body></html>
[HTTParty] [2016-11-18 21:59:25 -0500] <
I, [2016-11-18T21:59:25.832286 #92246] INFO -- : [HTTParty] [2016-11-18 21:59:25 -0500] > GET http://127.0.0.1
...
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
[HTTParty] [2016-11-18 21:59:25 -0500] <
Because fetching http://www.jennstrends.com/feed/ directly in httparty works, but http://www.jennstrends.com/feed without the trailing slash doesn't, this definitely feels like an httparty bug or a bug in something httparty is using, but I'm not sure what it could be just yet.
@jnunemaker we got another feed which makes problems, it raises the HTTParty::RedirectionTooDeep
exception:
Here the log output when using curl log format for httparty: https://gist.github.com/tak1n/392a603029a8665b5ca5ff840946bbb6
Those two errors relate somehow I mean what I can think of is that curl (and other libs/browsers) stop redirecting when the location is localhost or the same url as the origin?
For http://www.jennstrends.com/feed the problem is: Location url is 127.0.0.1 at some point For http://krugman.blogs.nytimes.com/feed/ the problem is: Location url is the request url (therefore a endless loop)
@tak1n I just went to the krugman nytimes site in chrome and got too many redirections, so it seems like httparty is similar to chrome at least on that one.
I was able to get it to eventually render with curl:
$ curl -isL http://krugman.blogs.nytimes.com/feed | grep "HTTP/1.1" | wc -l
15
@jnunemaker weird for me the above link to the krugman feed works fine in chrome :smile:
Chrome version: 56.0.2924.76 (64-bit)