readr icon indicating copy to clipboard operation
readr copied to clipboard

coerce columns before checking column types in write_delim

Open jsleight1 opened this issue 3 years ago • 2 comments

Survival columns have to be coerced using output_column before passing to write_delim.

This example fails

tibble(surv_col = survival::Surv(seq(10), rep(1, 10))) %>% 
    write_delim(file = "tmp.tsv", delim = "\t")

This coerces columns prior to writing to file and doesn't fail

tibble(surv_col = survival::Surv(seq(10), rep(1, 10))) %>% 
    mutate_all(output_column) %>%
    write_delim(file = "tmp.tsv", delim = "\t")

In write_delim here should x[] <- lapply(x, output_column) be called prior to check_column_types(x)?

jsleight1 avatar Dec 06 '21 14:12 jsleight1

I think this is probably correct reasoning, as it gives S3 classes built on top of types that would otherwise be unsupported a chance to be converted to either:

  • Character using output_column.default()
  • Whatever supported vroom type they want by implementing a custom output_column.my_type()

For the uninitiated, a Surv object is a "vector"-like object that is actually built on a matrix, and they implement an as.character() method.

x <- survival::Surv(seq(10), rep(c(1, 0), 5))

x
#>  [1]  1   2+  3   4+  5   6+  7   8+  9  10+

unclass(x)
#>       time status
#>  [1,]    1      1
#>  [2,]    2      0
#>  [3,]    3      1
#>  [4,]    4      0
#>  [5,]    5      1
#>  [6,]    6      0
#>  [7,]    7      1
#>  [8,]    8      0
#>  [9,]    9      1
#> [10,]   10      0
#> attr(,"type")
#> [1] "right"

# survival:::as.character.Surv
as.character(x)
#>  [1] " 1"  " 2+" " 3"  " 4+" " 5"  " 6+" " 7"  " 8+" " 9"  "10+"

So in theory that would be reasonable.

Similarly, types in {clock} are built as rcrd-types (on top of lists), which would be unsupported since readr/vroom isn't supposed to support list-cols directly (but see r-lib/vroom#391). But I also have a reasonable as.character() method implemented for those types that could be used here.

DavisVaughan avatar Dec 13 '21 14:12 DavisVaughan

As mentioned in r-lib/vroom#391, check_column_types() is currently a little broken because it doesn't correctly check for list columns, so we should fix that too.

DavisVaughan avatar Dec 13 '21 14:12 DavisVaughan