readr
readr copied to clipboard
coerce columns before checking column types in write_delim
Survival columns have to be coerced using output_column before passing to write_delim.
This example fails
tibble(surv_col = survival::Surv(seq(10), rep(1, 10))) %>%
write_delim(file = "tmp.tsv", delim = "\t")
This coerces columns prior to writing to file and doesn't fail
tibble(surv_col = survival::Surv(seq(10), rep(1, 10))) %>%
mutate_all(output_column) %>%
write_delim(file = "tmp.tsv", delim = "\t")
In write_delim here should x[] <- lapply(x, output_column) be called prior to check_column_types(x)?
I think this is probably correct reasoning, as it gives S3 classes built on top of types that would otherwise be unsupported a chance to be converted to either:
- Character using
output_column.default() - Whatever supported vroom type they want by implementing a custom
output_column.my_type()
For the uninitiated, a Surv object is a "vector"-like object that is actually built on a matrix, and they implement an as.character() method.
x <- survival::Surv(seq(10), rep(c(1, 0), 5))
x
#> [1] 1 2+ 3 4+ 5 6+ 7 8+ 9 10+
unclass(x)
#> time status
#> [1,] 1 1
#> [2,] 2 0
#> [3,] 3 1
#> [4,] 4 0
#> [5,] 5 1
#> [6,] 6 0
#> [7,] 7 1
#> [8,] 8 0
#> [9,] 9 1
#> [10,] 10 0
#> attr(,"type")
#> [1] "right"
# survival:::as.character.Surv
as.character(x)
#> [1] " 1" " 2+" " 3" " 4+" " 5" " 6+" " 7" " 8+" " 9" "10+"
So in theory that would be reasonable.
Similarly, types in {clock} are built as rcrd-types (on top of lists), which would be unsupported since readr/vroom isn't supposed to support list-cols directly (but see r-lib/vroom#391). But I also have a reasonable as.character() method implemented for those types that could be used here.
As mentioned in r-lib/vroom#391, check_column_types() is currently a little broken because it doesn't correctly check for list columns, so we should fix that too.