httr icon indicating copy to clipboard operation
httr copied to clipboard

Header passing fails with header that starts with HTTP_

Open ben519 opened this issue 6 years ago • 4 comments

I'm using httr to query some data from the Shopify API. Every call I make generates the warning message "In parse_http_status(lines[[1]]) : NAs introduced by coercion". For example,

response <- GET(
  url = "https://test-store.myshopify.com/admin/orders/count.json",
  httr::add_headers(Accept = "application/json"),
  httr::authenticate(user = "abc", password = "def")
)

Warning message:
In parse_http_status(lines[[1]]) : NAs introduced by coercion

It seems that the culprit is this function

parse_http_headers <- function(raw) {
  lines <- strsplit(rawToChar(raw), "\r?\n")[[1]]

  new_response <- grepl("^HTTP", lines)
  grps <- cumsum(new_response)

  lapply(unname(split(lines, grps)), parse_single_header)
}

specifically this bit: new_response <- grepl("^HTTP", lines) that attempts to pick out HTTP status lines from the vector of lines. Shopify's API always returns a header line like "HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT: 1/40" and this is falsely getting identified as an HTTP status line. Further down, it's getting parsed and an as.integer(status$status) bit it causing the coercion warning.

Can the regex used to identify HTTP status lines be tightened to prevent this false positive? Apologies - I don't know enough about possible HTTP status formats to make a good suggestion.

Thanks!

ben519 avatar May 07 '19 15:05 ben519

I'm interested in getting this fixed for an API I'm working with (the offending header in my case is HTTP_API_VERSION: v0.19.2) and happy to make a PR, but I also don't know enough about HTTP status formats!

Do status codes always start with HTTP/? Seems like all the ones being tested do (https://github.com/r-lib/httr/blob/master/tests/testthat/test-header.r#L24) so I'm happy to update the offending line to check for that if that's the case - or maybe what's needed is that the line starts with HTTP but not HTTP_, as the updated title indicates?

ty!

sharlagelfand avatar Jun 15 '20 14:06 sharlagelfand

Not sure if of any help. But today I got the exact same warning for another API (Caplena - non-public, so can't share any reprex) that is new, i.e. it never occured before and I didn't change any code.

The specifc API request is (where caplena_key is my API key):

library(httr)
response <- GET(url = paste0("https://api.caplena.com/api/projects/"),
                add_headers("Authorization" = paste0("apikey ", caplena_key)))

Warning message:
In parse_http_status(lines[[1]]) : NAs introduced by coercion

The problem is, if I now want to turn the response object into something readable, I was able to do so previously with:

projects <- content(response, encoding = "UTF-8")

This returned a list obbject I could then process further.

Now, I have to change this to

projects <- content(response, encoding = "UTF-8", type = "application/json")

Any helpful information I should add here from the response objects (e.g. the headers or sth.)?

deschen1 avatar Aug 23 '22 10:08 deschen1

One small update. I tested it a bit further and the warning only occurs on Windows machines, NOT on MACs.

My IT suggests that it might indeed be related to this part here:

parse_http_headers <- function(raw) {
  lines <- strsplit(rawToChar(raw), "\r?\n")[[1]]

  new_response <- grepl("^HTTP", lines)
  grps <- cumsum(new_response)

  lapply(unname(split(lines, grps)), parse_single_header)
}

especially the second line, because there is no carriage return on Windows. Or the order of line break and carraige return is different than on Mac, (I didn't fully understand this part).

OK, did some more debugging, and the issue is in httr:::parse_single_header.

Before getting into that function we are in the above function where the results of the lines, new_response and grps is:


lines <- c("HTTP/1.1 200 OK", "Server: nginx/1.18.0", "Date: Tue, 23 Aug 2022 15:50:57 GMT", 
           "Content-Type: application/json", "Vary: Accept, Accept-Language, Origin, Cookie", 
           "Allow: POST, GET", "HTTP_X_CORRELATION_ID: 44398-1", "Content-Language: en", 
           "X-Frame-Options: DENY", "X-Content-Type-Options: nosniff", "Referrer-Policy: same-origin", 
           "X-Protected-By: Sqreen", "Strict-Transport-Security: max-age=31536000; preload", 
           "Content-Encoding: gzip", "Via: 1.1 google", "Alt-Svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", 
           "Transfer-Encoding: chunked", "")

new_response <- c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)

gprs <- c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)

And the problem is in the second group. In the lapply, the thing that gets fed to the httr:::parse_single_header is:

lines <- c("HTTP_X_CORRELATION_ID: 44398-1", "Content-Language: en", "X-Frame-Options: DENY", 
           "X-Content-Type-Options: nosniff", "Referrer-Policy: same-origin", 
           "X-Protected-By: Sqreen", "Strict-Transport-Security: max-age=31536000; preload", 
           "Content-Encoding: gzip", "Via: 1.1 google", "Alt-Svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", 
           "Transfer-Encoding: chunked", "")

And if we feed it to the httr:::parse_single_header, then the warning occurs directly in the first line where parse_http_status is called:

parse_single_header <- function (lines) 
{
    status <- parse_http_status(lines[[1]])
    ...
}
parse_http_status <- function (x) 
{
  status <- as.list(strsplit(x, "\\s+")[[1]])
  names(status) <- c("version", "status", "message")[seq_along(status)]
  status$status <- as.integer(status$status)
  status
}

where x <- "HTTP_X_CORRELATION_ID: 44398-1"

And then the first line returns a list with two elements, but the names(status) part expects three elements and then wants to turn the status into an integer, which doesn't work here.

So that's what I can contribute and I hope there is something we/you can do about it.

FWIW, here's also the content of the raw object that is required to create the lines in the parse_http_headers function.

raw <- as.raw(c(0x48, 0x54, 0x54, 0x50, 0x2f, 0x31, 0x2e, 0x31, 0x20, 
0x32, 0x30, 0x30, 0x20, 0x4f, 0x4b, 0x0d, 0x0a, 0x53, 0x65, 0x72, 
0x76, 0x65, 0x72, 0x3a, 0x20, 0x6e, 0x67, 0x69, 0x6e, 0x78, 0x2f, 
0x31, 0x2e, 0x31, 0x38, 0x2e, 0x30, 0x0d, 0x0a, 0x44, 0x61, 0x74, 
0x65, 0x3a, 0x20, 0x54, 0x75, 0x65, 0x2c, 0x20, 0x32, 0x33, 0x20, 
0x41, 0x75, 0x67, 0x20, 0x32, 0x30, 0x32, 0x32, 0x20, 0x31, 0x35, 
0x3a, 0x35, 0x30, 0x3a, 0x35, 0x37, 0x20, 0x47, 0x4d, 0x54, 0x0d, 
0x0a, 0x43, 0x6f, 0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x54, 0x79, 
0x70, 0x65, 0x3a, 0x20, 0x61, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 
0x74, 0x69, 0x6f, 0x6e, 0x2f, 0x6a, 0x73, 0x6f, 0x6e, 0x0d, 0x0a, 
0x56, 0x61, 0x72, 0x79, 0x3a, 0x20, 0x41, 0x63, 0x63, 0x65, 0x70, 
0x74, 0x2c, 0x20, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x4c, 
0x61, 0x6e, 0x67, 0x75, 0x61, 0x67, 0x65, 0x2c, 0x20, 0x4f, 0x72, 
0x69, 0x67, 0x69, 0x6e, 0x2c, 0x20, 0x43, 0x6f, 0x6f, 0x6b, 0x69, 
0x65, 0x0d, 0x0a, 0x41, 0x6c, 0x6c, 0x6f, 0x77, 0x3a, 0x20, 0x50, 
0x4f, 0x53, 0x54, 0x2c, 0x20, 0x47, 0x45, 0x54, 0x0d, 0x0a, 0x48, 
0x54, 0x54, 0x50, 0x5f, 0x58, 0x5f, 0x43, 0x4f, 0x52, 0x52, 0x45, 
0x4c, 0x41, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x49, 0x44, 0x3a, 0x20, 
0x34, 0x34, 0x33, 0x39, 0x38, 0x2d, 0x31, 0x0d, 0x0a, 0x43, 0x6f, 
0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x4c, 0x61, 0x6e, 0x67, 0x75, 
0x61, 0x67, 0x65, 0x3a, 0x20, 0x65, 0x6e, 0x0d, 0x0a, 0x58, 0x2d, 
0x46, 0x72, 0x61, 0x6d, 0x65, 0x2d, 0x4f, 0x70, 0x74, 0x69, 0x6f, 
0x6e, 0x73, 0x3a, 0x20, 0x44, 0x45, 0x4e, 0x59, 0x0d, 0x0a, 0x58, 
0x2d, 0x43, 0x6f, 0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x54, 0x79, 
0x70, 0x65, 0x2d, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x3a, 
0x20, 0x6e, 0x6f, 0x73, 0x6e, 0x69, 0x66, 0x66, 0x0d, 0x0a, 0x52, 
0x65, 0x66, 0x65, 0x72, 0x72, 0x65, 0x72, 0x2d, 0x50, 0x6f, 0x6c, 
0x69, 0x63, 0x79, 0x3a, 0x20, 0x73, 0x61, 0x6d, 0x65, 0x2d, 0x6f, 
0x72, 0x69, 0x67, 0x69, 0x6e, 0x0d, 0x0a, 0x58, 0x2d, 0x50, 0x72, 
0x6f, 0x74, 0x65, 0x63, 0x74, 0x65, 0x64, 0x2d, 0x42, 0x79, 0x3a, 
0x20, 0x53, 0x71, 0x72, 0x65, 0x65, 0x6e, 0x0d, 0x0a, 0x53, 0x74, 
0x72, 0x69, 0x63, 0x74, 0x2d, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x70, 
0x6f, 0x72, 0x74, 0x2d, 0x53, 0x65, 0x63, 0x75, 0x72, 0x69, 0x74, 
0x79, 0x3a, 0x20, 0x6d, 0x61, 0x78, 0x2d, 0x61, 0x67, 0x65, 0x3d, 
0x33, 0x31, 0x35, 0x33, 0x36, 0x30, 0x30, 0x30, 0x3b, 0x20, 0x70, 
0x72, 0x65, 0x6c, 0x6f, 0x61, 0x64, 0x0d, 0x0a, 0x43, 0x6f, 0x6e, 
0x74, 0x65, 0x6e, 0x74, 0x2d, 0x45, 0x6e, 0x63, 0x6f, 0x64, 0x69, 
0x6e, 0x67, 0x3a, 0x20, 0x67, 0x7a, 0x69, 0x70, 0x0d, 0x0a, 0x56, 
0x69, 0x61, 0x3a, 0x20, 0x31, 0x2e, 0x31, 0x20, 0x67, 0x6f, 0x6f, 
0x67, 0x6c, 0x65, 0x0d, 0x0a, 0x41, 0x6c, 0x74, 0x2d, 0x53, 0x76, 
0x63, 0x3a, 0x20, 0x68, 0x33, 0x3d, 0x22, 0x3a, 0x34, 0x34, 0x33, 
0x22, 0x3b, 0x20, 0x6d, 0x61, 0x3d, 0x32, 0x35, 0x39, 0x32, 0x30, 
0x30, 0x30, 0x2c, 0x68, 0x33, 0x2d, 0x32, 0x39, 0x3d, 0x22, 0x3a, 
0x34, 0x34, 0x33, 0x22, 0x3b, 0x20, 0x6d, 0x61, 0x3d, 0x32, 0x35, 
0x39, 0x32, 0x30, 0x30, 0x30, 0x0d, 0x0a, 0x54, 0x72, 0x61, 0x6e, 
0x73, 0x66, 0x65, 0x72, 0x2d, 0x45, 0x6e, 0x63, 0x6f, 0x64, 0x69, 
0x6e, 0x67, 0x3a, 0x20, 0x63, 0x68, 0x75, 0x6e, 0x6b, 0x65, 0x64, 
0x0d, 0x0a, 0x0d, 0x0a))

One last update:

it seems the initial problem why it (now) doesn't work on Windows, but still works on Mac, is that this second http occurence "HTTP_X_CORRELATION_ID: 44398-1" is uppercase on Windows, but lowercase on Mac. And that's why the function gets confused on Windows because it checks for uppercase HTTP in the parse_http_headers function.

deschen1 avatar Aug 23 '22 15:08 deschen1

Investigated a bit further here and first thing is that HTTP headers are case insensitive, so not sure what httr would return if a website uses lowercase (because the grepl in parse_http_status seems to check for uppercase HTTP): https://stackoverflow.com/questions/5258977/are-http-headers-case-sensitive

And second thing is that the parsing of the status purely by the header starting with "http" is problematic. Maybe the header should be split and then checked if any element contains a number that would be a valid status response (e.g. a 200 or 404 or sth.).

deschen1 avatar Aug 24 '22 13:08 deschen1

httr has been superseded in favour of httr2, so is no longer under active development. If this problem is still important to you in httr2, I'd suggest filing an issue offer there 😄. Thanks for using httr!

hadley avatar Oct 31 '23 20:10 hadley