http
http copied to clipboard
`content_length` raises an exception instead of `nil` or `Fixnum`
In some rare cases the method def content_length
raises TypeError
instead of nil
or Fixnum
. For example, you can check the following scenario:
Reproduction code
url = "https://imagery.pragprog.com/products/595/tpp20.jpg"
response = HTTP.get(url)
response.content_length
Expected behavior
Expected nil
or Fixnum
instead I get
TypeError: can't convert Array into Integer
The response
object is:
#<HTTP::Response/1.1 200 OK {"Accept-Ranges"=>["bytes", "bytes"], "Age"=>"4590838", "Cache-Control"=>"public max-age=315360000", "Content-Length"=>["2802494", "2802494"], "Date"=>"Fri, 20 Sep 2019 14:33:37 GMT", "Etag"=>"\"8121dc48ec937ecf919bc2c54aa961a4\"", "Expires"=>"Thu, 31 Dec 2037 23:55:55 GMT", "Last-Modified"=>"Mon, 06 May 2019 20:19:24 GMT", "Server"=>"imagery3", "Via"=>"1.1 varnish", "X-Varnish"=>"312290075 306038822", "X-Amz-Id-2"=>"H6BrRPBmJ2AKray6L+xApTdXlP511cVEXJ6DnDMEYVPcmaWJCGpzs3DTNiRez2x+SuLsQxZcUfI=", "X-Amz-Request-Id"=>"7ED8CC65AA4FCF29", "Connection"=>"Close"}>
The value of Content-Length
is "Content-Length"=>["2802494", "2802494"]
which is suspicious.
Looks like the server is returning the Content-Length
header twice:
$ http https://imagery.pragprog.com/products/595/tpp20.jpg
HTTP/1.1 200 OK
ACCEPT_RANGES: bytes
Accept-Ranges: bytes
Age: 4603039
CONTENT_LENGTH: 2802494
Cache-Control: public max-age=315360000
Connection: keep-alive
Content-Length: 2802494
Date: Fri, 20 Sep 2019 17:56:57 GMT
ETAG: "8121dc48ec937ecf919bc2c54aa961a4"
Expires: Thu, 31 Dec 2037 23:55:55 GMT
LAST_MODIFIED: Mon, 06 May 2019 20:19:24 GMT
Server: imagery3
Via: 1.1 varnish
X-Varnish: 312319203 306038822
X_AMZ_ID_2: H6BrRPBmJ2AKray6L+xApTdXlP511cVEXJ6DnDMEYVPcmaWJCGpzs3DTNiRez2x+SuLsQxZcUfI=
X_AMZ_REQUEST_ID: 7ED8CC65AA4FCF29
I believe we should use #last
value of content-length headers.
Wow, that's crazy. Any prior art on what other HTTP clients do in this case?
Yeah. Would be nice to know which is default behaviour other clients use. I see there are 3 ways on how to act in given scenario:
- consider first value only
- consider last value only
- treat as response with no Content-Length
some sort of mixture of the above variant:
def content_length
values = headers.get("Content-Length").uniq
values.first.to_i if 1 == values.size
end
Kinda pseudo-code...
I seem to recall reading somewhere that some headers allow multiple values separated by some delimiter, and if one of those headers is specified multiple times then the client should consider that equivalent to it being specified once with the values separated by the delimiter.
In the case of headers that do not allow multiple values, then later duplicate header values should replace ones specified earlier.
I don't know if there's a comprehensive list of which headers allow multiple values and their associated delimiters (because the delimiters can vary between headers). I also don't know if we want to deal with any of that, or just punt and special-case the Content-Length header with values.is_a?(Array) ? values.last : values
.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html Section 4.2:
Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)].
https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13 Content-Length is not one of the multliple value headers.
I did some experiments with Faraday
and Httparty
.
Faraday
It seems Faraday keeps only one value at headers.
url = "https://imagery.pragprog.com/products/595/tpp20.jpg"
response = Faraday.get url
puts response.headers['content_length'] #=> "2802494"
HTTParty
Same with this gem. It keeps only one value
url = "https://imagery.pragprog.com/products/595/tpp20.jpg"
response = HTTParty.get url
puts response.headers['content_lenght'] #=> "2802494"
I can examine more clients if you want to
I’m curious what curl and major notable browsers do
I've wrote a small testing server:
# frozen_string_literal: true
require "socket"
server = TCPServer.new(3000)
loop do
Thread.new(server.accept) do |io|
status = 200
headers = []
case io.gets[%r{^GET /(\d+) HTTP}, 1]
when "1"
headers << "Content-Length: 3"
headers << "Content-Length: 3"
when "2"
headers << "Content-Length: 9"
headers << "Content-Length: 3"
when "3"
headers << "Content-Length: 3"
headers << "Content-Length: 9"
when "4"
headers << "Content-Length: 9"
headers << "Content-Length: 9"
else
status = 500
end
io << "HTTP/1.1 #{status}\r\n"
headers.each { |v| io << v << "\r\n" }
io << "\r\n123456789"
rescue => e
warn e
ensure
io.close
end
end
And that's the behaviour of different clients:
HTTPie
Ensures that if there are multiple Content-Length - all are the same
CURL
Uses last given Content-Length
Firefox
Similar behaviour to HTTPie
- If both Content-Lengths are the same - use it
- When they mismatch - fail:
Personally I prefer failing in case if multiple different content length values are given:
def content_length
values = headers.get("Content-Length").uniq
raise "WTF? o_O" if 1 < values.size
values.first&.to_i
end
Worth to test how they handle absolutely undxpected value of content length as well.
Enhanced dummy server with couple of more scenarios:
# frozen_string_literal: true
require "socket"
server = TCPServer.new(3000)
loop do
Thread.new(server.accept) do |io|
status = 200
headers = []
case io.gets[%r{^GET /(\d+) HTTP}, 1]
when "1"
headers << "Content-Length: 3"
headers << "Content-Length: 3"
when "2"
headers << "Content-Length: 9"
headers << "Content-Length: 3"
when "3"
headers << "Content-Length: 3"
headers << "Content-Length: 9"
when "4"
headers << "Content-Length: 9"
headers << "Content-Length: 9"
when "5"
headers << "Content-Length: X"
headers << "Content-Length: 9"
when "6"
headers << "Content-Length: 9"
headers << "Content-Length: X"
when "7"
headers << "Content-Length: 3"
headers << "Content-Length: X"
headers << "Content-Length: 9"
when "8"
headers << "Content-Length: 9"
headers << "Content-Length: X"
headers << "Content-Length: 3"
when "9"
headers << "Content-Length: 9"
headers << "Content-Length: X"
headers << "Content-Length: 9"
else
status = 500
end
io << "HTTP/1.1 #{status}\r\n"
headers.each { |v| io << v << "\r\n" }
io << "\r\n123456789" unless 500 == status
rescue => e
warn e
ensure
io.close
end
end
HTTPie
curl
Firefox
Allows multiple values as long as result is unambiguous (all values are the same).
The response includes a Content-Length
and CONTENT_LENGTH
header. It seems like http.rb is treating them as equivalent (probably due to the normalizing), but I believe it should treat them as two separate, unrelated headers.
The response includes a
Content-Length
andCONTENT_LENGTH
header. It seems like http.rb is treating them as equivalent (probably due to the normalizing), but I believe it should treat them as two separate, unrelated headers.
Right. So we have to tackle 2 problems here, I believe.
This seems relevant:
https://tools.ietf.org/html/rfc7230#section-3.3.2
If a message is received that has multiple Content-Length header fields with field-values consisting of the same decimal value, or a single Content-Length header field with a field value containing a list of identical decimal values (e.g., "Content-Length: 42, 42"), indicating that duplicate Content-Length header fields have been generated or combined by an upstream message processor, then the recipient MUST either reject the message as invalid or replace the duplicated field-values with a single valid Content-Length field containing that decimal value prior to determining the message body length or forwarding the message.