httparty icon indicating copy to clipboard operation
httparty copied to clipboard

Fetching a feed with redirects not working

Open tak1n opened this issue 8 years ago • 5 comments

We currently have a problem where a feed we want to fetch finally resolves to 127.0.0.1:80.

The address of the feed is http://www.jennstrends.com/feed which redirects to http://www.jennstrends.com/feed/. In a normal browser everything works.

I've modified HTTParty::Request#perform to this:

def perform(&block)
    validate
    setup_raw_request
    chunked_body = nil

    self.last_response = http.request(@raw_request) do |http_response|
      if block
        chunks = []

        http_response.read_body do |fragment|
          chunks << fragment unless options[:stream_body]
          block.call(fragment)
        end

        chunked_body = chunks.join
      end
    end

    handle_deflation unless http_method == Net::HTTP::Head
    puts "Response status: #{last_response.code}"
    puts "Response redirects: #{response_redirects?}"
    puts "Redirect location: #{last_response['location']}"
    puts ""
    handle_host_redirection if response_redirects?
    handle_response(chunked_body, &block)
end

In HTTParty following happens

[1] pry(main)> HTTParty.get('http://www.jennstrends.com/feed')
Response status: 301
Response redirects: true
Redirect location: http://www.jennstrends.com/feed/

Response status: 302
Response redirects: true
Redirect location: http://127.0.0.1

Response status: 200
Response redirects: 
Redirect location: 

=> "<!DOCTYPE html>\n<html>\n<head>\n<title>Welcome to nginx!</title>\n<style>\n    body {\n        width: 35em;\n        margin: 0 auto;\n        font-family: Tahoma, Verdana, Arial, sans-serif;\n    }\n</style>\n</head>\n<body>\n<h1>Welcome to nginx!</h1>\n<p>If you see this page, the nginx web server is successfully installed and\nworking. Further configuration is required.</p>\n\n<p>For online documentation and support please refer to\n<a href=\"http://nginx.org/\">nginx.org</a>.<br/>\nCommercial support is available at\n<a href=\"http://nginx.com/\">nginx.com</a>.</p>\n\n<p><em>Thank you for using nginx.</em></p>\n</body>\n</html>\n"

I know this is a misconfiguration of the site providing the feed but for example chrome does not redirect me to localhost and displays the feed properly. This leads me to the question how can chrome do the "right" thing although the site is obviously not configured properly.

Feel free to close if there is nothing we can do here, I will contact the page owner and write them that there is probably a misconfiguration in their system.

tak1n avatar Nov 09 '16 13:11 tak1n

Hmm. curl seems to do the "right" thing too.

$ curl -ISL http://www.jennstrends.com/feed
HTTP/1.1 301 Moved Permanently
...
Location: http://www.jennstrends.com/feed/
...

HTTP/1.1 200 OK
...

I used the curl formatter for httparty logging, which basically gives you the debugging above for free:

require "logger"
> HTTParty.get('http://www.jennstrends.com/feed', logger: Logger.new(STDOUT), log_format: :curl)
I, [2016-11-18T21:59:25.745820 #92246]  INFO -- : [HTTParty] [2016-11-18 21:59:25 -0500] > GET http://www.jennstrends.com/feed
[HTTParty] [2016-11-18 21:59:25 -0500] > 
[HTTParty] [2016-11-18 21:59:25 -0500] < HTTP/1.1 301
...
[HTTParty] [2016-11-18 21:59:25 -0500] < Location: http://www.jennstrends.com/feed/
...

[HTTParty] [2016-11-18 21:59:25 -0500] < 
I, [2016-11-18T21:59:25.830156 #92246]  INFO -- : [HTTParty] [2016-11-18 21:59:25 -0500] > GET http://www.jennstrends.com/feed/
[HTTParty] [2016-11-18 21:59:25 -0500] > Headers: 
...
[HTTParty] [2016-11-18 21:59:25 -0500] < Location: http://127.0.0.1
...
[HTTParty] [2016-11-18 21:59:25 -0500] < 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://127.0.0.1">here</a>.</p>
</body></html>

[HTTParty] [2016-11-18 21:59:25 -0500] < 
I, [2016-11-18T21:59:25.832286 #92246]  INFO -- : [HTTParty] [2016-11-18 21:59:25 -0500] > GET http://127.0.0.1
...
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

[HTTParty] [2016-11-18 21:59:25 -0500] < 

Because fetching http://www.jennstrends.com/feed/ directly in httparty works, but http://www.jennstrends.com/feed without the trailing slash doesn't, this definitely feels like an httparty bug or a bug in something httparty is using, but I'm not sure what it could be just yet.

jnunemaker avatar Nov 19 '16 03:11 jnunemaker

@jnunemaker we got another feed which makes problems, it raises the HTTParty::RedirectionTooDeep exception:

Here the log output when using curl log format for httparty: https://gist.github.com/tak1n/392a603029a8665b5ca5ff840946bbb6

Those two errors relate somehow I mean what I can think of is that curl (and other libs/browsers) stop redirecting when the location is localhost or the same url as the origin?

For http://www.jennstrends.com/feed the problem is: Location url is 127.0.0.1 at some point For http://krugman.blogs.nytimes.com/feed/ the problem is: Location url is the request url (therefore a endless loop)

tak1n avatar Feb 17 '17 08:02 tak1n

@tak1n I just went to the krugman nytimes site in chrome and got too many redirections, so it seems like httparty is similar to chrome at least on that one.

chrome

jnunemaker avatar Feb 17 '17 14:02 jnunemaker

I was able to get it to eventually render with curl:

$ curl -isL http://krugman.blogs.nytimes.com/feed | grep "HTTP/1.1" | wc -l
      15

jnunemaker avatar Feb 17 '17 14:02 jnunemaker

@jnunemaker weird for me the above link to the krugman feed works fine in chrome :smile:

screenshot from 2017-02-17 17-02-04

Chrome version: 56.0.2924.76 (64-bit)

tak1n avatar Feb 17 '17 16:02 tak1n