httr icon indicating copy to clipboard operation
httr copied to clipboard

Avoid URL-encoding cookie values twice

Open salim-b opened this issue 6 years ago • 2 comments

When I set a cookie for a httr GET request, the provided cookie value always gets URL-encoded regardless if it already was or not:

library(magrittr)

# define cookie value
cookie_value <- "s:AK+G5ez1"

# URL-encode it
cookie_value_enc <- xml2::url_escape(cookie_value)

# make GET request supplying the URL-encoded cookie value
result <- httr::GET(url = "https://www.republik.ch/",
                    httr::set_cookies(`connect.sid` = cookie_value_enc))

# print cookie value as it was transmitted by httr (double-URL-encoded!)
# and URL-decode and print it until it matches the original value
cookie_value_httr <-
  result$request$options$cookie %>%
  stringr::str_remove(pattern = "connect.sid=")

cookie_value_httr
#> [1] "s%253AAK%252BG5ez1"

is_eq <- FALSE

while ( !is_eq )
{
  cookie_value_httr %<>% xml2::url_unescape()
  if ( cookie_value_httr == cookie_value ) is_eq <- TRUE
  print(cookie_value_httr)
}
#> [1] "s%3AAK%2BG5ez1"
#> [1] "s:AK+G5ez1"

Created on 2019-09-07 by the reprex package (v0.3.0)

The curl CLI (7.47.0) on the other hand doesn't do this (i.e. the cookie value gets transmitted unmodified) when I run either of:

curl --cookie 'connect.sid=s:AK+G5ez1' https://www.republik.ch/
curl --cookie 'connect.sid=s%3AAK%2BG5ez1' https://www.republik.ch/

I'm using httr 1.4.1 (with underlying libcurl 7.47.0) with R 3.6.1 on Ubuntu 16.04.

salim-b avatar Sep 07 '19 11:09 salim-b

When we look at the code, we see that indeed set_cookie will encode any strings with curl_escape https://github.com/r-lib/httr/blob/6b84c576f866956e01835e1b6f02c55c8012d4e0/R/cookies.r#L14-L23

One way could be to add an argument to set_config() or so that it could respect class AsIs when string is encapsulated in I( )

Currently, you can do as you in curl by providing directly the cookie option

cookie_value <- "s:AK+G5ez1"
# URL-encode it
cookie_value_enc <- xml2::url_escape(cookie_value)
cookie_value_enc
#> [1] "s%3AAK%2BG5ez1"
# this will get escaped again
httr::GET("https://httpbin.org/cookies", httr::set_cookies(`connect.sid` = cookie_value_enc))
#> Response [https://httpbin.org/cookies]
#>   Date: 2019-09-07 12:20
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 63 B
#> {
#>   "cookies": {
#>     "connect.sid": "s%253AAK%252BG5ez1"
#>   }
#> }
# provide a not already escape cookie
httr::GET("https://httpbin.org/cookies", httr::set_cookies(`connect.sid` = cookie_value))
#> Response [https://httpbin.org/cookies]
#>   Date: 2019-09-07 12:20
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 59 B
#> {
#>   "cookies": {
#>     "connect.sid": "s%3AAK%2BG5ez1"
#>   }
#> }
# or pass directly the encoded cookie using config
httr::GET("https://httpbin.org/cookies", httr::config(cookie = paste0("connect.sid=", cookie_value_enc)))
#> Response [https://httpbin.org/cookies]
#>   Date: 2019-09-07 12:20
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 59 B
#> {
#>   "cookies": {
#>     "connect.sid": "s%3AAK%2BG5ez1"
#>   }
#> }

Created on 2019-09-07 by the reprex package (v0.3.0)

Hope it helps

cderv avatar Sep 07 '19 12:09 cderv

Thanks for the httr::config(cookie = paste0("connect.sid=", cookie_value_enc)) hint!

But I'm still wondering

  1. is URL-escaping cookie values really necessary? The curl CLI doesn't seem to do it and the request succeeds in my case (i.e. the server recognizes the cookie value, which of course is actually a different one than I'm posting here):

    curl --verbose --cookie 'connect.sid=s:AK+G5ez1' https://www.republik.ch/ 2>&1 | grep 'Cookie: '
    
    > Cookie: connect.sid=s:AK+G5ez1
    
  2. if you come to the conclusion that 1) is still necessary, wouldn't it be possible to somehow auto-detect if the value(s) provided to set_cookies() already are URL-encoded, and if so, leave them untouched?

    I suppose this is kind of an "ambuigity hell", but at least there seem to be some "good enough" solutions...

salim-b avatar Sep 07 '19 12:09 salim-b

Since httr is superseded in favour of httr2, I've filed an isuse over there: https://github.com/r-lib/httr2/issues/369

hadley avatar Oct 31 '23 20:10 hadley