curl icon indicating copy to clipboard operation
curl copied to clipboard

`curl_fetch_memory()` for LDAP query against MS Active Directory returns embedded nul bytes when called from Windows but not from Linux

Open mskyttner opened this issue 4 years ago • 4 comments

I ran into an issue using curl_fetch_memory() with v 4.2 of the curl R package where R calls made from Linux (having system lib curl 7.58.0 installed) returns different raw bytes than calls made from Windows (having Windows lib curl 7.55.1 reported by curl -Vat the command prompt).

Since the Linux and Windows environments have the same version of the R curl package, it seems the difference comes from the curllib used by the OS, but I cannot ask Windows users to upgrade or recompile.

This means I'm trying to find a workaround for dealing with the nul bytes on Windows on the R side, but I'm not sure what the best way is to handle those.

The R code I use to make a request against the Active Directory server looks like this:

# set parameters for curl_fetch_memory() call
query <- ldap_cmd_curl(filter, cfg$ldap_host, cfg$ldap_base)
userpwd <- paste0(cfg$ldap_user, ":", cfg$ldap_pass)

handle <- curl::new_handle()

curl::handle_setopt(handle, userpwd = userpwd)

handle_setheaders(handle,
  "Content-Type" = "text/plain;charset=UTF-8",
  "Cache-Control" = "no-cache",
  "User-Agent" = "bibliomatrix R-package"
)

res <- curl::curl_fetch_memory(query, handle)

On Windows, if using rawToChar(res$content), I get nothing back due to the embedded nul bytes (it works when used from Linux). If instead using readChar on Windows, I get some of the data back, but I get only everything up to the first nul byte. On Linux I get everything (because there are no nul bytes embedded there).

When I store the raw bytes from an the same LDAP query made on Windows and Linux with exactly the same parameters to disk using readr::write_rds() and compare the two objects, the content is different:

# o1 is using R package curl v 4.2 and Windows lib curl 7.55.1 (using curl -V )
# o2 is using R package curl v 4.2 and  lib curl 7.58.0 

o1 <- readr::read_rds("~/Downloads/curlres.rds")  # win
o2 <- readr::read_rds("~/Downloads/curlres2.rds")  # linux

identical(o1, o2)  # returns FALSE
rawToChar(o1$content)  # complains about embedded nul in string

Could this be some encoding issue? On Linux, using readr::guess_encoding() suggests ASCII and on Windows it is not known.

I tried using "iconv" but it wasn't "nul byte friendly" and in the end I ended up doing this to workaround the nul bytes issue on the receiving end: https://github.com/KTH-Library/bibliomatrix/blob/master/R/ldap.R#L155-L158

Sorry for not providing a better reproducible example inside this issue, the LDAP response objects contain personal data so I don't want to dput it here, but can provide in a different channel if needed.

Any recommendations regarding alternatives or options when it comes to dealing with embedded nul bytes?

mskyttner avatar Nov 13 '19 09:11 mskyttner