parquetize icon indicating copy to clipboard operation
parquetize copied to clipboard

Replace read_delim by read_delim_arrow

Open ddotta opened this issue 1 year ago • 2 comments

In csv_to_parquet() function, it would be graet to replace read_delim() from {readr} with read_delim_arrow() from arrow, which would improve performance.

ddotta avatar May 17 '24 14:05 ddotta

Discarded for now, I can't figure out how to do the equivalent with read_delim_arrow() of this feature

To be explored

ddotta avatar May 17 '24 15:05 ddotta

Either of the following should pass the list on I think.

fun <- function(file, other_args = list()) {
  do.call(arrow::read_delim_arrow, c(file, other_args))
}

fun2 <- function(file, other_args = list()) {
  rlang::inject(arrow::read_delim_arrow(file, !!!other_args))
}

# testing
url <- "https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv"

bench::mark(
  fun(url),
  fun2(url),
  arrow::read_delim_arrow(url)
)

bench::mark(
  fun(url, list(col_names = F, skip = 10)),
  fun2(url, list(col_names = F, skip = 10)),
  arrow::read_delim_arrow(url, col_names = F, skip = 10)
)

ds-turner avatar Mar 25 '25 10:03 ds-turner