read_csv is Ten Times Slower Than R Base read.csv

Open DarioS opened this issue 9 months ago • 0 comments

It does not handle high dimensional data efficiently. Columns are genes and rows are hospital patients.

In Linux or MacOS,

wget https://figshare.com/ndownloader/files/47361049

Then, in R,

> library(readr)
> system.time(test <- read_csv("47361049"))
New names:
• `` -> `...1`
Rows: 86 Columns: 6289
── Column specification ───────────────────────────
dbl (6289): ...1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
   user  system elapsed
  5.123  19.585  23.369
> system.time(test2 <- read.csv("47361049")) # Base case.
   user  system elapsed
  1.140   0.242   1.383
> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)
    ...        ...
time zone: Australia/Sydney
tzcode source: system (glibc)
attached base packages:
  stats     graphics  grDevices utils     datasets  methods   base
other attached packages:
  readr_2.1.5

?read_delim's Description section is too brief and uninformative. Can you at least run it and reproduce it?

Feb 19 '25 01:02 DarioS