rrapply Documentation: Add example with melt/unmelt that has list elements without names

Documentation: Add example with melt/unmelt that has list elements without names

Open mpettis opened this issue 1 year ago • 1 comments

When we have lists that have some sub-lists that are unnamed, the melt/unmelt process will introduce names to those originally unnamed sub-lists. I propose that this example be added to documentation to show how those names can be removed, provided that the other named elements are character strings that cannot be parsed as integers. Here is the stackoverflow question where I asked about a related question and answered this one (@JorisChau actually provided the answer to the original question I had).

Stackoverflow answer: https://stackoverflow.com/a/76121308/1022967

Code example to be included with documentation/vignette/article?

#;; Melt and unmelt, remove names that arise from list positions in the original.
library(rrapply)

# Note: second level does not have names, just positions
lst <- list(A=list(list(A1A=2,A1B=3)
                   , list(A2A=2,A2B=3)))
str(lst)
#> List of 1
#>  $ A:List of 2
#>   ..$ :List of 2
#>   .. ..$ A1A: num 2
#>   .. ..$ A1B: num 3
#>   ..$ :List of 2
#>   .. ..$ A2A: num 2
#>   .. ..$ A2B: num 3

# Melting to dataframe
melt_df <-
    rrapply(lst, how="melt")
melt_df
#>   L1 L2  L3 value
#> 1  A  1 A1A     2
#> 2  A  1 A1B     3
#> 3  A  2 A2A     2
#> 4  A  2 A2B     3

# Unmelt back to list, but now the second level has names that are character
# strings of their integer position.  Compare to original list.
unmelt_lst <-
    rrapply(melt_df, how="unmelt")
str(unmelt_lst)
#> List of 1
#>  $ A:List of 2
#>   ..$ 1:List of 2
#>   .. ..$ A1A: num 2
#>   .. ..$ A1B: num 3
#>   ..$ 2:List of 2
#>   .. ..$ A2A: num 2
#>   .. ..$ A2B: num 3

# Here is how we can remove those second level names that were just from their
# positions.  Note: this assumes that there are no other names in this list that
# are pure character versions of an integer.

isParseableInteger <- function(x) {
    # False if null or na
    if (is.null(x) || is.na(x)) return(FALSE)
    
    # Check if they are numeric, return if it is equal to its integer form.
    if (is.numeric(x)) return(x == as.integer(x))
    
    # Cast to integer, check if it is not NA
    !is.na(suppressWarnings(as.integer(x)))
}

rrapply(
    unmelt_lst, 
    condition = \(x, .xname) {isParseableInteger(.xname)},
    f = \(x) {NULL},
    how = "names") |>
    str()
#> List of 1
#>  $ A:List of 2
#>   ..$ :List of 2
#>   .. ..$ A1A: num 2
#>   .. ..$ A1B: num 3
#>   ..$ :List of 2
#>   .. ..$ A2A: num 2
#>   .. ..$ A2B: num 3

^{Created on 2023-04-27 with reprex v2.0.2}

Apr 27 '23 14:04 mpettis

The documentation already includes an example for unnamed nested lists (instead of partially named nested lists), see the note at the end of: https://jorischau.github.io/rrapply/articles/2-efficient-melting-unnesting.html#unmelt-to-nested-list.

The above solution feels like a non-ideal workaround. I think a more robust solution would be to not fill in missing names (with positions) by default when using how = "melt", so that the original partially named list can be retrieved with how = "unmelt". I just need to figure out how to recognize the nested list structure in this case, as the following melted data.frame would not contain sufficient information to reconstruct the nested list:

#>   L1   L2  L3 value
#> 1  A <NA> A1A     2
#> 2  A <NA> A1B     3
#> 3  A <NA> A2A     2
#> 4  A <NA> A2B     3

May 02 '23 13:05 JorisChau

rrapply rrapply copied to clipboard

Documentation: Add example with melt/unmelt that has list elements without names

rrapply
rrapply copied to clipboard