pins-r
pins-r copied to clipboard
could `pin_write()` support `type = "arrow"` uncompressed?
This may also impact pins for Python, but I thought I would start here.
The motivation for this question is that Arrow for JavaScript does not support compression. I know there's a workaround using pin_upload() and pin_download(), but it seems to add friction for R/Python users.
To be clear, I'm not suggesting that any default behavior change; just to ask about the possibility for something like a compression = FALSE argument somewhere.
Here's a larger writeup describing my motivation, with a first hack on how a BoardUrl could work using JavaScript.
Thanks!
Ah, that's too bad about compression in js :/ . Since pins stores metadata for everything, it seems like having a strategy for specifying save/load options there could be useful..
Maybe related in long-term to https://github.com/rstudio/pins-python/issues/18
Right now arguments aren't passed to savers when someone does pin_write, so a quick fix would be adding a new type (e.g. "uncompressed_arrow"?)
The way we handle the dots now from the user-facing function pin_write() passes them through to pin_store(), which means we can do things as outlined in #628 but we can't use the dots for this which happens in object_write().
I wonder if a new "uncompressed_arrow" type is perhaps the best way to go? Or should we try to support the different compress options that RDS has as well when considering this? We could add a new compression or compress argument to pin_write() after the dots without causing problems.
Could something like write_args, a list with arguments to pass to the writing-function, work here?
(I'm certain there's a better name)
object_write <- function(x, path, type = "rds", write_args = NULL) {
type <- arg_match0(type, setdiff(object_types, "file"))
switch(type,
arrow = rlang::exec(arrow::write_feather, x, path, !!!write_args)
# you get the idea
)
path
}
This might present a problem where write_args wants to override an already-named argument, e.g.:
json = jsonlite::write_json(x, path, auto_unbox = TRUE)
That could be a great option, I think. This would still have to go after the dots to avoid a breaking change; the function signature would end up looking like:
pin_write(
board,
x,
name = NULL,
type = NULL,
title = NULL,
description = NULL,
metadata = NULL,
versioned = NULL,
...
write_args = list()
)
I am on board!
At the risk of being pedantic, I wonder if the argument should be named .write_args, as it comes after the dots - I think this is to help avoid collisions with arg names that could be sent using the dots.
Happy to start on a PR, if you like.
I think adding the ability for people to customize how pins are saved will be really useful--but would prefer we separate out the immediate value of supporting uncompressed arrow, from the more high risk / high reward task of how to provide general extension for save / load.
Looking at tools like vetiver, it's clear that how things are saved and how they're loaded are coupled to each other. My concern is that addressing save on its own may work for self-describing formats (e.g. Rds, arrow), but I also expect customizing write to create new surprising behaviors.
If we could empower devs to steward the save/load process (e.g. create a new type that can do w/e it wants with something like write_args), it can help identify who is responsible when extension goes awry (e.g. error hit while using writing "some_extension_type", so we know it is not a pins responsibility to fix).
Holding for now :)
@ijlyttle Would you be open to doing a PR where we add a new type, perhaps called "custom" (what could be a better name?) here:
https://github.com/rstudio/pins-r/blob/debc94529b8b73cda41ab50419d6276cd6c56d24/R/pin-read-write.R#L163
We would expect folks to write code like pin_write(board, x, "my-name", .f = arrow::write_feather(compression = "uncompressed")). The argument .f would be new, after the dots.
And then the entry in switch() for the new custom type:
https://github.com/rstudio/pins-r/blob/debc94529b8b73cda41ab50419d6276cd6c56d24/R/pin-read-write.R#L126-L133
would be something like eval(rlang::call_modify(.f, x, path)). That seems to work pretty well:
library(rlang)
call <- quote(arrow::write_feather(compression = "uncompressed"))
path <- tempfile()
x <- mtcars
new_call <- call_modify(call, x, path)
eval(new_call)
head(arrow::read_feather(path))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Created on 2022-08-18 with reprex v2.0.2
We maybe should do a bit more careful checking of the call and use something like call_match(new_call, new_call[[1]])? And put a nice error message for users who pass in something that won't work.
What do you think?
I like it! And yes, happy to propose a PR!
Just so I don't forget, I'd want to be able to use pin_read() and have it read it like an Arrow file. This may be a bit ambitious, but I figured I'd get the thought out.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.