tibblify icon indicating copy to clipboard operation
tibblify copied to clipboard

tib_df and empty array

Open krlmlr opened this issue 3 years ago • 2 comments

I'm seeing weird references to "colmajor" when an empty JSON array [] is parsed by a tib_df() . What am I doing wrong?

CC @TSchiefer.

library(tibblify)

json <- '[{ "a": 1, "b": [{ "c": 1, "d": 2 }, {}] }, { "a": 2, "b": [] }]'
nested_list <- jsonlite::fromJSON(json)

spec <- tibblify::guess_tspec(nested_list)
spec
#> tspec_df(
#>   tib_int("a"),
#>   tib_df(
#>     "b",
#>     tib_int("c", required = FALSE),
#>     tib_int("d", required = FALSE),
#>   ),
#> )
tibblify::tibblify(nested_list, spec)
#> Error in `tibblify::tibblify()`:
#> ! Problem while tibblifying `x$b[[2]]$c`
#> Caused by error in `withCallingHandlers()`:
#> ! Field is absent in colmajor.
#> ℹ In file 'add-value.c' at line 395.
#> ℹ This is an internal error that was detected in the base package.
#> Backtrace:
#>     ▆
#>  1. ├─tibblify::tibblify(nested_list, spec)
#>  2. │ └─rlang::try_fetch(...)
#>  3. │   ├─base::tryCatch(...)
#>  4. │   │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  5. │   │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  6. │   │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>  7. │   └─base::withCallingHandlers(...)
#>  8. └─rlang:::stop_internal_c_lib(...)
#>  9.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)

json <- '[{ "a": 1, "b": [{ "c": 1, "d": 2 }, {}] }, { "a": 2, "b": [{ "c": 1 }] }]'
nested_list <- jsonlite::fromJSON(json)

spec <- tibblify::guess_tspec(nested_list)
spec
#> tspec_df(
#>   tib_int("a"),
#>   tib_df(
#>     "b",
#>     tib_int("c"),
#>     tib_int("d", required = FALSE),
#>   ),
#> )
tibblify::tibblify(nested_list, spec)
#> Error in `tibblify::tibblify()`:
#> ! Field d is required but does not exist in `x$b[[2]]`.
#> ℹ For `.input_form = "colmajor"` every field is required.
#> Backtrace:
#>      ▆
#>   1. ├─tibblify::tibblify(nested_list, spec)
#>   2. │ └─rlang::try_fetch(...)
#>   3. │   ├─base::tryCatch(...)
#>   4. │   │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   5. │   │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   6. │   │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#>   7. │   └─base::withCallingHandlers(...)
#>   8. └─tibblify:::stop_required_colmajor(`<named list>`)
#>   9.   └─tibblify:::tibblify_abort(msg)
#>  10.     └─cli::cli_abort(..., class = "tibblify_error", .envir = .envir)
#>  11.       └─rlang::abort(...)

json <- '[{ "a": 1, "b": [{ "c": 1, "d": 2 }, {}] }, { "a": 2, "b": null }]'
nested_list <- jsonlite::fromJSON(json)

spec <- tibblify::guess_tspec(nested_list)
spec
#> tspec_df(
#>   tib_int("a"),
#>   tib_df(
#>     "b",
#>     tib_int("c", required = FALSE),
#>     tib_int("d", required = FALSE),
#>   ),
#> )
tibblify::tibblify(nested_list, spec)
#> # A tibble: 2 × 2
#>       a                  b
#>   <int> <list<tibble[,2]>>
#> 1     1            [2 × 2]
#> 2     2

Created on 2023-04-17 with reprex v2.0.2

krlmlr avatar Apr 17 '23 20:04 krlmlr

This is because the code path for colmajor is used when the input is a data frame. This makes the error message indeed quite confusing. Regarding the errors themselves:

  1. Empty tibble
json <- '[{ "a": 1, "b": [{ "c": 1, "d": 2 }, {}] }, { "a": 2, "b": [] }]'
nested_list <- tibble::as_tibble(jsonlite::fromJSON(json))
nested_list
#> # A tibble: 2 × 2
#>       a b           
#>   <int> <list>      
#> 1     1 <df [2 × 2]>
#> 2     2 <df [0 × 0]>

Created on 2023-07-07 with reprex v2.0.2

In the colmajor format (and therefore data frames) all columns are required. So, to me it kind of makes sense to error here but it is also quite confusing.

  1. No column d
json <- '[{ "a": 1, "b": [{ "c": 1, "d": 2 }, {}] }, { "a": 2, "b": [{ "c": 1 }] }]'
nested_list <- tibble::as_tibble(jsonlite::fromJSON(json))
nested_list
#> # A tibble: 2 × 2
#>       a b           
#>   <int> <list>      
#> 1     1 <df [2 × 2]>
#> 2     2 <df [1 × 1]>

Created on 2023-07-07 with reprex v2.0.2

Basically the same case as before.

  1. NULL
json <- '[{ "a": 1, "b": [{ "c": 1, "d": 2 }, {}] }, { "a": 2, "b": null }]'
nested_list <- tibble::as_tibble(jsonlite::fromJSON(json))
nested_list
#> # A tibble: 2 × 2
#>       a b           
#>   <int> <list>      
#> 1     1 <df [2 × 2]>
#> 2     2 <NULL>

Created on 2023-07-07 with reprex v2.0.2

This works because NULL gets a special treatment as the missing value of a list.

mgirlich avatar Jul 07 '23 12:07 mgirlich

But it is also a bit annoying that all examples work with the same spec if using simplifyDataFrame = FALSE.

mgirlich avatar Jul 07 '23 12:07 mgirlich