purrr
purrr copied to clipboard
`assign_in` and `modify_in` should create nested elements
Current implementation of assign_in
, update_in
and pluck<-
doesn't create new elements.
This is in contrast to [[<-
semantics and most importantly to the parallel functional tooling in other languages (clojure, python).
assign_in(list(), "aa", 34)
#> Error: Plucked object must have at least one element
pluck(list(), "aa") <- 34
#> Error in pluck(list(), "aa") <- 34: invalid (NULL) left side of assignment
Moreover, this is inconsistent with pluck
itself which returns NULL
on missing elements. So one would expect no error for missing elements in pluck<-
as well.
Seems related: #636
It was on purpose. We are not following [[<-
semantics for adding new elements to vectors, at least by numerical position.
On the other hand I think we should make an exception for adding elements by name, as named indexing corresponds more to a map than to a vector interface. We didn't add this exception right away to have time to think about it.
I don't care much about numeric indexing, but may I ask why allowing numeric indexing in nested assignment is a problem? I do see a few use cases when extending verctors (or lists) through numeric pluck-ing can be useful.
may I ask why allowing numeric indexing in nested assignment is a problem
Indexing is not a problem, out-of-bounds indexing is. We have decided to make it an error in our vector semantics, for safety. It is easy to work around if necessary with some additional steps.
@lionel- Would it be possible to speed up the resolution on this one? Longer it takes, more difficult it would be to modify the behavior. I believe this is a critical functionality which is hard to achieve otherwise.
I don't expect we'll be working on purrr until after RStudio::conf.
In terms of collecting data, I would definitely have a use for this in googlesheets4 right now. I would very much like to assign, by name, a new element into every element of a list.
I think assign_in()
should be able to create and remove elements but I'm not sure whether modify_in()
should. It may be better to respect the invariant that the input size is the same as the output size in modify_
functions (with the nuance that for modify_in()
this invariant operates deeper than the input itself). Related issue: #716.
@lionel-
I believe, if there is no strong reasons for a restriction, then it better not be there. There are plenty of use cases for modify_in to be able to add elements, so it would be rather inconvenient if it couldn't.
It may be better to respect the invariant that the input size is the same as the output size in modify_
Why input/output size consistency is so important? I could see that there is a speed penalty only when the element is not there, but that's surely not a good reason to impose such a restriction.
One potential problem with modify_in
is how to pass through the "missing" value. If NULL, then the caller won't be able to differentiate between actual NULLs and missing NULLs. But for that, there might be an extra argument (missing_value
or something) providing the default missing value.
Why input/output size consistency is so important?
It allows making assumptions and helps reason about code. When you use map()
, you know for sure that the output has the same size as the input. I think modify()
should have the same constraints.
map
is a bad parallel. In all programming languages it does preserve the input size and it does so by very definition.
Programming should be practical; not enforcing unnecessary invariants unless really needed.
If in doubt about the semantics, it's a good idea to follow conventions already in place in other languages (e.g. python, clojure).
yeah but modify()
is a variant of map()
and modify_in()
is a variant of modify()
. Your concerns are noted.
And the assignment action that's contained in modify_in()
should be consistent with vctrs semantics of assignment.
I see. Valid point indeed. This transitivity didn't occur to me.
But then, a separate function update_in
could be doing what the similarly named function in other languages does. Can be?
This would be a very helpful feature for me
@lionel- ping. In the past two years I hardly ever used modify_in and assign_in. Inability to create nested elements is a killer. Any possibility to prioritize this issue?
Presumably works something like this:
assign_in(list(), "x", 1)
# list(x = 1)
assign_in(list(), 2, 1)
# list(NULL, 1)
assign_in(list(), c("x", "x"), 1)
# list(x = list(x = 1))
assign_in(list(), c("x", 2), 1)
# list(x = list(NULL, 1))
assign_in()
currently just constructs a call like x[["x"]][[2]] <- 1
and then evaluates it so this will need a fresh implementation.
This turned out to be surprisingly simple to implement because recursive sub-assignment has the desired properties:
`list_slice2<-` <- function(x, i, value) {
if (is.null(value)) {
x[i] <- list(NULL)
} else {
x[[i]] <- value
}
x
}
assign_in <- function(x, idx, value) {
n <- length(idx)
if (n > 1) {
value <- assign_in(x[[idx[[1]], idx[-1], value)
}
list_slice2(x, idx[[1]]) <- value
x
}
str(assign_in(list(), "x", 1))
#> List of 1
#> $ x: num 1
str(assign_in(list(), "x", NULL))
#> List of 1
#> $ x: NULL
str(assign_in(list(), 2, 1))
#> List of 2
#> $ : NULL
#> $ : num 1
str(assign_in(list(), c("x", "y"), 1))
#> List of 1
#> $ x:List of 1
#> ..$ y: num 1
str(assign_in(list(), c(2, 1), 1))
#> List of 2
#> $ : NULL
#> $ :List of 1
#> ..$ : num 1
str(assign_in(list(), list("x", 2), 1))
#> List of 1
#> $ x:List of 2
#> ..$ : NULL
#> ..$ : num 1
str(assign_in(list(), list(2, "x"), 1))
#> List of 2
#> $ : NULL
#> $ :List of 1
#> ..$ x: num 1
Created on 2022-09-06 with reprex v2.0.2