purrr icon indicating copy to clipboard operation
purrr copied to clipboard

`assign_in` and `modify_in` should create nested elements

Open vspinu opened this issue 5 years ago • 15 comments

Current implementation of assign_in, update_in and pluck<- doesn't create new elements. This is in contrast to [[<- semantics and most importantly to the parallel functional tooling in other languages (clojure, python).

assign_in(list(), "aa", 34)
#> Error: Plucked object must have at least one element
pluck(list(), "aa") <- 34
#> Error in pluck(list(), "aa") <- 34: invalid (NULL) left side of assignment

Moreover, this is inconsistent with pluck itself which returns NULL on missing elements. So one would expect no error for missing elements in pluck<- as well.

Seems related: #636

vspinu avatar Sep 16 '19 09:09 vspinu

It was on purpose. We are not following [[<- semantics for adding new elements to vectors, at least by numerical position.

On the other hand I think we should make an exception for adding elements by name, as named indexing corresponds more to a map than to a vector interface. We didn't add this exception right away to have time to think about it.

lionel- avatar Sep 16 '19 09:09 lionel-

I don't care much about numeric indexing, but may I ask why allowing numeric indexing in nested assignment is a problem? I do see a few use cases when extending verctors (or lists) through numeric pluck-ing can be useful.

vspinu avatar Sep 16 '19 10:09 vspinu

may I ask why allowing numeric indexing in nested assignment is a problem

Indexing is not a problem, out-of-bounds indexing is. We have decided to make it an error in our vector semantics, for safety. It is easy to work around if necessary with some additional steps.

lionel- avatar Oct 15 '19 15:10 lionel-

@lionel- Would it be possible to speed up the resolution on this one? Longer it takes, more difficult it would be to modify the behavior. I believe this is a critical functionality which is hard to achieve otherwise.

vspinu avatar Dec 18 '19 12:12 vspinu

I don't expect we'll be working on purrr until after RStudio::conf.

lionel- avatar Dec 18 '19 15:12 lionel-

In terms of collecting data, I would definitely have a use for this in googlesheets4 right now. I would very much like to assign, by name, a new element into every element of a list.

jennybc avatar Dec 21 '19 21:12 jennybc

I think assign_in() should be able to create and remove elements but I'm not sure whether modify_in() should. It may be better to respect the invariant that the input size is the same as the output size in modify_ functions (with the nuance that for modify_in() this invariant operates deeper than the input itself). Related issue: #716.

lionel- avatar Jan 20 '20 16:01 lionel-

@lionel-

I believe, if there is no strong reasons for a restriction, then it better not be there. There are plenty of use cases for modify_in to be able to add elements, so it would be rather inconvenient if it couldn't.

It may be better to respect the invariant that the input size is the same as the output size in modify_

Why input/output size consistency is so important? I could see that there is a speed penalty only when the element is not there, but that's surely not a good reason to impose such a restriction.

One potential problem with modify_in is how to pass through the "missing" value. If NULL, then the caller won't be able to differentiate between actual NULLs and missing NULLs. But for that, there might be an extra argument (missing_value or something) providing the default missing value.

vspinu avatar Jan 24 '20 09:01 vspinu

Why input/output size consistency is so important?

It allows making assumptions and helps reason about code. When you use map(), you know for sure that the output has the same size as the input. I think modify() should have the same constraints.

lionel- avatar Jan 24 '20 10:01 lionel-

map is a bad parallel. In all programming languages it does preserve the input size and it does so by very definition.

Programming should be practical; not enforcing unnecessary invariants unless really needed.

If in doubt about the semantics, it's a good idea to follow conventions already in place in other languages (e.g. python, clojure).

vspinu avatar Jan 24 '20 21:01 vspinu

yeah but modify() is a variant of map() and modify_in() is a variant of modify(). Your concerns are noted.

lionel- avatar Jan 25 '20 08:01 lionel-

And the assignment action that's contained in modify_in() should be consistent with vctrs semantics of assignment.

lionel- avatar Jan 25 '20 09:01 lionel-

I see. Valid point indeed. This transitivity didn't occur to me.

But then, a separate function update_in could be doing what the similarly named function in other languages does. Can be?

vspinu avatar Jan 26 '20 19:01 vspinu

This would be a very helpful feature for me

mattpollock avatar Dec 10 '20 21:12 mattpollock

@lionel- ping. In the past two years I hardly ever used modify_in and assign_in. Inability to create nested elements is a killer. Any possibility to prioritize this issue?

vspinu avatar Sep 29 '21 18:09 vspinu

Presumably works something like this:

assign_in(list(), "x", 1)
# list(x = 1)

assign_in(list(), 2, 1)
# list(NULL, 1)

assign_in(list(), c("x", "x"), 1)
# list(x = list(x = 1))

assign_in(list(), c("x", 2), 1)
# list(x = list(NULL, 1))

assign_in() currently just constructs a call like x[["x"]][[2]] <- 1 and then evaluates it so this will need a fresh implementation.

hadley avatar Sep 06 '22 21:09 hadley

This turned out to be surprisingly simple to implement because recursive sub-assignment has the desired properties:

`list_slice2<-` <- function(x, i, value) {
  if (is.null(value)) {
    x[i] <- list(NULL)
  } else {
    x[[i]] <- value
  }
  x
}

assign_in <- function(x, idx, value) {
  n <- length(idx)

  if (n > 1) {
    value <- assign_in(x[[idx[[1]], idx[-1], value)
  }
  list_slice2(x, idx[[1]]) <- value
  x
}

str(assign_in(list(), "x", 1))
#> List of 1
#>  $ x: num 1
str(assign_in(list(), "x", NULL))
#> List of 1
#>  $ x: NULL
str(assign_in(list(), 2, 1))
#> List of 2
#>  $ : NULL
#>  $ : num 1

str(assign_in(list(), c("x", "y"), 1))
#> List of 1
#>  $ x:List of 1
#>   ..$ y: num 1
str(assign_in(list(), c(2, 1), 1))
#> List of 2
#>  $ : NULL
#>  $ :List of 1
#>   ..$ : num 1

str(assign_in(list(), list("x", 2), 1))
#> List of 1
#>  $ x:List of 2
#>   ..$ : NULL
#>   ..$ : num 1
str(assign_in(list(), list(2, "x"), 1))
#> List of 2
#>  $ : NULL
#>  $ :List of 1
#>   ..$ x: num 1

Created on 2022-09-06 with reprex v2.0.2

hadley avatar Sep 06 '22 22:09 hadley