parquetize
parquetize copied to clipboard
Replace read_delim by read_delim_arrow
In csv_to_parquet() function, it would be graet to replace read_delim() from {readr} with read_delim_arrow() from arrow, which would improve performance.
Discarded for now, I can't figure out how to do the equivalent with read_delim_arrow() of this feature
To be explored
Either of the following should pass the list on I think.
fun <- function(file, other_args = list()) {
do.call(arrow::read_delim_arrow, c(file, other_args))
}
fun2 <- function(file, other_args = list()) {
rlang::inject(arrow::read_delim_arrow(file, !!!other_args))
}
# testing
url <- "https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv"
bench::mark(
fun(url),
fun2(url),
arrow::read_delim_arrow(url)
)
bench::mark(
fun(url, list(col_names = F, skip = 10)),
fun2(url, list(col_names = F, skip = 10)),
arrow::read_delim_arrow(url, col_names = F, skip = 10)
)