curl
curl copied to clipboard
`curl_fetch_memory()` for LDAP query against MS Active Directory returns embedded nul bytes when called from Windows but not from Linux
I ran into an issue using curl_fetch_memory()
with v 4.2 of the curl
R package where R calls made from Linux (having system lib curl 7.58.0 installed) returns different raw bytes than calls made from Windows (having Windows lib curl 7.55.1 reported by curl -V
at the command prompt).
Since the Linux and Windows environments have the same version of the R curl
package, it seems the difference comes from the curllib used by the OS, but I cannot ask Windows users to upgrade or recompile.
This means I'm trying to find a workaround for dealing with the nul bytes on Windows on the R side, but I'm not sure what the best way is to handle those.
The R code I use to make a request against the Active Directory server looks like this:
# set parameters for curl_fetch_memory() call
query <- ldap_cmd_curl(filter, cfg$ldap_host, cfg$ldap_base)
userpwd <- paste0(cfg$ldap_user, ":", cfg$ldap_pass)
handle <- curl::new_handle()
curl::handle_setopt(handle, userpwd = userpwd)
handle_setheaders(handle,
"Content-Type" = "text/plain;charset=UTF-8",
"Cache-Control" = "no-cache",
"User-Agent" = "bibliomatrix R-package"
)
res <- curl::curl_fetch_memory(query, handle)
On Windows, if using rawToChar(res$content)
, I get nothing back due to the embedded nul bytes (it works when used from Linux). If instead using readChar
on Windows, I get some of the data back, but I get only everything up to the first nul byte. On Linux I get everything (because there are no nul bytes embedded there).
When I store the raw bytes from an the same LDAP query made on Windows and Linux with exactly the same parameters to disk using readr::write_rds()
and compare the two objects, the content is different:
# o1 is using R package curl v 4.2 and Windows lib curl 7.55.1 (using curl -V )
# o2 is using R package curl v 4.2 and lib curl 7.58.0
o1 <- readr::read_rds("~/Downloads/curlres.rds") # win
o2 <- readr::read_rds("~/Downloads/curlres2.rds") # linux
identical(o1, o2) # returns FALSE
rawToChar(o1$content) # complains about embedded nul in string
Could this be some encoding issue? On Linux, using readr::guess_encoding()
suggests ASCII and on Windows it is not known.
I tried using "iconv" but it wasn't "nul byte friendly" and in the end I ended up doing this to workaround the nul bytes issue on the receiving end: https://github.com/KTH-Library/bibliomatrix/blob/master/R/ldap.R#L155-L158
Sorry for not providing a better reproducible example inside this issue, the LDAP response objects contain personal data so I don't want to dput
it here, but can provide in a different channel if needed.
Any recommendations regarding alternatives or options when it comes to dealing with embedded nul bytes?