Allowing `httr::write_disk` to actually create files
Session Info
r$> devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.2.0 (2022-04-22)
os macOS Monterey 12.1
system x86_64, darwin21.3.0
ui unknown
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/London
date 2023-05-11
pandoc 2.11.1.1 @ /usr/local/bin/pandoc
─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.2.0)
brio 1.1.3 2021-11-30 [1] CRAN (R 4.2.0)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.2.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)
chron 2.3-57 2022-05-30 [1] CRAN (R 4.2.0)
class 7.3-20 2022-01-16 [2] CRAN (R 4.2.0)
classInt 0.4-7 2022-06-10 [1] CRAN (R 4.2.0)
cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.0)
crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)
crul 1.3 2022-09-03 [1] CRAN (R 4.2.0)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.2.0)
DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0)
desc 1.4.1 2022-03-06 [1] CRAN (R 4.2.0)
devtools 2.4.3 2021-11-30 [1] CRAN (R 4.2.0)
dplyr 1.0.10 2022-09-01 [1] CRAN (R 4.2.0)
e1071 1.7-11 2022-06-07 [1] CRAN (R 4.2.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
fauxpas 0.5.2 2023-05-03 [1] CRAN (R 4.2.0)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.2.0)
httr * 1.4.3 2022-05-04 [1] CRAN (R 4.2.0)
igraph 1.3.4 2022-07-19 [1] CRAN (R 4.2.0)
jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)
KernSmooth 2.23-20 2021-05-03 [2] CRAN (R 4.2.0)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.2.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
processx 3.5.3 2022-03-25 [1] CRAN (R 4.2.0)
proxy 0.4-27 2022-06-09 [1] CRAN (R 4.2.0)
ps 1.7.0 2022-04-23 [1] CRAN (R 4.2.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.0)
readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.0)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.0)
rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0)
rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
P safedata * 1.1.2.9000 2023-05-03 [?] load_all()
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
sf 1.0-11 2023-03-15 [1] CRAN (R 4.2.0)
testthat * 3.1.4 2022-04-26 [1] CRAN (R 4.2.0)
tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0)
triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.2.0)
units 0.8-0 2022-02-05 [1] CRAN (R 4.2.0)
urltools 1.7.3 2019-04-14 [1] CRAN (R 4.2.0)
usethis 2.1.5 2021-12-09 [1] CRAN (R 4.2.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)
webmockr * 0.9.0 2023-02-28 [1] CRAN (R 4.2.0)
whisker 0.4 2019-08-28 [1] CRAN (R 4.2.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
[1] /usr/local/lib/R/4.2/site-library
[2] /usr/local/Cellar/r/4.2.0/lib/R/library
P ── Loaded and on-disk path mismatch.
I'm trying to use webmockr to mock an API for my safedata package, to allow code examples and vignette building in packaging to run without requiring a running server providing the API.
The issue I'm having is that I'd like to have request stub actually create a file from the response body. That is how httr::GET with write_disk works without mocking - the response body is written to the named file. However, when the request is stubbed then I think write_disk is ignored and replaced with whatever is set in the to_return body, which could be an existing file containing the response or some other in memory response. Or at least I think that is how it works (see R code below).
I can see why this is used - there's no duplication of content and running httr:content on the response object gets you the body - but in my case I want to use the request to build up a local file structure in the same way that it would if I wasn't mocking the API.
Is there a way to get webmockr to provide a response body from a file (or directly!) and allow httr to write it out to a new file?
r$> ## make a temp file
r$> f <- tempfile(fileext = ".json")
r$> ## write something to the file
r$> cat("{\"hello\":\"world\"}\n", file = f)
r$> httr_mock()
r$> stub_request("get", "https://httpbin.org/get") %>%
to_return(body = file(f),
headers = list('content-type' = "application/json"))
r$> # Stubbed request ignores write disk and points to body file
r$> out <- GET("https://httpbin.org/get", write_disk('new_file.txt'))
r$> out
Response [https://httpbin.org/get]
Date: 2023-05-11 23:25
Status: 200
Content-Type: application/json
Size: 18 B
<ON DISK> /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//RtmpKOpQFB/filec99b220f73af.json
r$> file.exists('new_file.txt')
[1] FALSE
r$> stub_request("get", "https://httpbin.org/get2") %>%
to_return(body = readLines(f),
headers = list('content-type' = "application/json"))
r$> # Again, write_disk is ignored and in memory body is used
r$> out <- GET("https://httpbin.org/get2", write_disk('new_file2.txt'))
r$> out
Response [https://httpbin.org/get2]
Date: 2023-05-11 23:49
Status: 200
Content-Type: application/json
Size: 17 B
r$> file.exists('new_file2.txt')
[1] FALSE
r$> httr::content(out)
$hello
[1] "world"
r$> # Turn off mocking
r$> httr_mock(FALSE)
r$> # 'Normal' usage writes the content to the named file.
r$> out <- GET("https://httpbin.org/get", write_disk('new_file.txt'))
r$> out
Response [https://httpbin.org/get]
Date: 2023-05-11 22:25
Status: 200
Content-Type: application/json
Size: 367 B
<ON DISK> new_file.txt
r$> file.exists('new_file.txt')
[1] TRUE
Thanks for the issue! I'll take a look and get back to you soon
It may well be out of scope but it wasn't obvious that it wouldn't work. I'm not trying to record an existing API - I can see that vcr would be the right tool for the job - just enforce that a particular URL will result in content from a local file being passed in to httr::GET and then into whichever output option is set there.
I assume you saw ?mocking-disk-writing given your examples?
See also https://github.com/ropensci/webmockr/issues/57 and https://github.com/ropensci/vcr/issues/81
I think the issue is that for this to work with webmockr the file has to exist already. In your example, you're creating that tempfile f, but then trying to use a completely new file in write_disk that doesn't exist yet.
See also the function mock_file(), which might be helpful.
If I understand the webmockr design correctly:
- in un-mocked use,
httr::GETtakes the body content from the URL and puts that into the file specified inwrite_disk, which can be a completely new file. - In mocked use, the
webmockr::HttrAdaptertakes the content from a file on disk but whenGETis used withwrite_disksubstitutes the path to that file for any named file. That means that thehttr::contentcan retrieve the content from theresponseobject without duplicating the file. But that mechanism does preclude creating a duplicate file in a new location. - In my specific use case, I don't just want to mock the
httr::contentbut also create the file when usinghttr::GET.
Is that right? My use case seems like it might be outside the package design and scope. It would either need an extra argument (actual_outfile) to be passed to HttrAdapter or possibly a package setting (actually_duplicate) to modify the behaviour and that seems like a clunky edge case.
- Correct on httr w/o mocking
- I think that's right. You can always just use
overwrite=TRUEinwrite_disk - Right, that's clear
I don't think it's outside what webmockr can do. Can you try the below example that uses mock_file, i think that satisfies your use case of creating a new file during the get request:
library(httr)
f <- tempfile(fileext = ".json")
file.exists(f)
stub_request("get", "https://hb.opencpu.org/get") %>%
to_return(
body = mock_file(path = f, payload = "{\"foo\": \"bar\"}"),
headers = list('content-type' = "application/json")
)
file.exists(f)
out <- GET("https://hb.opencpu.org/get", write_disk(f))
out
file.exists(f)
out$content
readLines(out$content)
content(out, "text", encoding = "UTF-8")
See how file.exists is FALSE until after the GET call is made
That's really useful - I hadn't quite got that mechanism straight in my head. I was trying to avoid having to hold the file contents in memory in the stub registry, but the payloads aren't that big!
The thing that is more of an issue is that I didn't mention that some of my payloads are binary files. The mock_file docs only mention character content and - tracing mock_file handling back into the Adapter class - cat is used to write the content. I guess mock_file could be extended to detect the content type (or rely on headers) and write the content appropriately but that definitely looks like new functionality.
I think i could support binary too. Would adding that be the last thing you need for this to be useable?
Yes - I think that would do it. For the moment, I've got a solution working using a really basic httr callback, but being able to use the full range of wi_th and to_return would be great.
A more general comment is that I found it hard to understand where the content of a response is coming from / going to. If the use case is simply to get the content itself - which is probably 99% of the time - then that doesn't matter, but if did for me. The basic pattern seems to be that the type of thebody argument to to_return always supersedes any setting of httr::write_disk or httr::write_memory.
-
If a source filename or a
fileconnection is passed in, then the content is always a pointer to the source file.Incidentally, if the filename is a string, it isn't classed as an
httr::pathso the content is the path name and doesn't show as<ON DISK>inprint(response). BecausefromJSONchecks if thetxtargument is a filename, the actual content does then get read in. It seems like it would be more consistent if the filename was converted to apathobject in the response and thenhttr::contentwould load the actual content and pass that on to the parser. I can imagine that might be something where different parser behaviour makes it easier to pass the filename around rather than the content though. -
If character data is passed in, the response content is always in memory, even when
write_diskis used. -
If a
mock_fileis passed in then the content is always read into the registry and then is always written to the file path used inmock_fileeven whenwrite_memoryis requested.
I'm not quite sure what I'm asking for here but I found some of those outcomes surprising. One approach might be something like a separation in to_return between body (which sets the content) and destination, which could be as.is, or one of the httr::write_ methods to replace the existing arguments to GET (or an equivalent for crul?).
library(httr)
library(webmockr)
enable()
mock_file <- tempfile(fileext = ".json")
out_file <- tempfile(fileext = ".json")
source_file <- tempfile(fileext = ".json")
cat("{\"foo\": \"bar\"}", file = source_file)
# 4 x 2 combinations of possible to_return bodies and httr output settings
bodies <- list(
"source_file" = source_file,
"source_path" = source_file,
"char" = "{\"foo\": \"bar\"}",
"mock_file" = mock_file(path = mock_file, payload = "{\"foo\": \"bar\"}")
)
httr_out <- list(
"mem" = write_memory(),
"disk" = write_disk(out_file)
)
# filename/role lookup
files <- list("mock_file", "out_file", "source_file")
names(files) <- c(mock_file, out_file, source_file)
stub_registry_clear()
# Loop over combinations
for (idxb in seq_along(bodies)) {
for (idxo in seq_along(httr_out)) {
# Get inputs
bod <- bodies[idxb]
out <- httr_out[idxo]
if (names(bod) == "source_file") {
body <- file(bod[[names(bod)]])
} else {
body <- bod[[names(bod)]]
}
# Create stub with requested body
stub <- stub_request("get", "https://hb.opencpu.org/get") %>%
to_return(
body = body,
headers = list("content-type" = "application/json")
)
# GET response with requested output
response <- GET("https://hb.opencpu.org/get", out[[names(out)]])
# Report
content_is_httr_path <- inherits(response$content, "path")
if (content_is_httr_path) {
actual_content <- response$content
} else {
actual_content <- content(response, "text", encoding = "UTF-8")
}
cat(
"body: ", names(bod),
"\nhttr requested output: ", names(out),
"\nroundtrip success: ",
all.equal(content(response), list("foo" = "bar")),
"\ncontent is httr path: ", content_is_httr_path,
"\nactual content: ", actual_content,
"\nwhich path: ", files[[actual_content]],
"\n-------------------\n"
)
remove_request_stub(stub)
}
}
body: source_file
httr requested output: mem
roundtrip success: TRUE
content is httr path: TRUE
actual content: /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json
which path: source_file
-------------------
body: source_file
httr requested output: disk
roundtrip success: TRUE
content is httr path: TRUE
actual content: /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json
which path: source_file
-------------------
body: source_path
httr requested output: mem
roundtrip success: TRUE
content is httr path: FALSE
actual content: /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json
which path: source_file
-------------------
body: source_path
httr requested output: disk
roundtrip success: TRUE
content is httr path: FALSE
actual content: /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json
which path: source_file
-------------------
body: char
httr requested output: mem
roundtrip success: TRUE
content is httr path: FALSE
actual content: {"foo": "bar"}
which path:
-------------------
body: char
httr requested output: disk
roundtrip success: TRUE
content is httr path: FALSE
actual content: {"foo": "bar"}
which path:
-------------------
body: mock_file
httr requested output: mem
roundtrip success: TRUE
content is httr path: TRUE
actual content: /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef5566e9582e8.json
which path: mock_file
-------------------
body: mock_file
httr requested output: disk
roundtrip success: TRUE
content is httr path: TRUE
actual content: /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef5566e9582e8.json
which path: mock_file
-------------------
Okay, I can make a change for binary support
If a mock_file is passed in then the content is always read into the registry and then is always written to the file path used in mock_file even when write_memory is requested.
hmm, mock_file is only meant to help with mocking write_disk and equivalent in crul, so its not surprising that the behavior when using mock_file with write_memory is not what you'd expect - i'm not sure I want to make any changes along these lines
You're right there is a discrepancy between what body is returned and what you request httr to do wrt disk/memory. The issue is that we need to tell webmockr what to do explicitly - the http client package (httr, crul) does not and should not determine what is returned in the mocked response other than indirectly - we grab some things from the request object to put into the response.
In this thread the focus is on mocking the response body when the http client library writes to disk. Unfortunately, the Ruby library webmock this is based off does not handle this use case of an http library writing directly to disk. It just doesn't happen in Ruby. They make you do an http request, then write to disk if you want to do that. So there's no nicely paved route for how to handle this.
Many thanks.
Interesting that Ruby forces you to save to disk manually - there's a lot that can go wrong there with content-type and encodings! On the general comment - which is way outside this issue - I think the model I have in my head would be to separate body from destination in creating a stub_request. I think there are four options for destination:
-
respect_call(whatever the actual call tohttrorcrulset up), -
divert_to_memory(forces the response body into memory), -
divert_to_disk(forces the response body into a named file, which might be different from an existingwrite_disk), -
load_from_source(which only applies tofilebodies and points the response object to the original file to avoid duplicating files).
Interesting that Ruby forces you to save to disk manually - there's a lot that can go wrong there with content-type and encodings!
Yeah, but I think actually R may be the outlier in this case - httr and friends in R definitely make things simpler for users compared to other langs.
I think there are four options for
destination
I'll have a think about it
Okay, I can make a change for binary support
sorry about the delay on this, working on integrating httr2 here and in vcr, and it's rather complicated
@sckott I have a home rolled solution (with much reduced functionality and robustness) so safedata is working well with a proof of concept mocking of URLs. And I'm on leave for a fortnight, so there's absolutely no rush!