readr
readr copied to clipboard
read_csv is Ten Times Slower Than R Base read.csv
It does not handle high dimensional data efficiently. Columns are genes and rows are hospital patients.
In Linux or MacOS,
wget https://figshare.com/ndownloader/files/47361049
Then, in R,
> library(readr)
> system.time(test <- read_csv("47361049"))
New names:
• `` -> `...1`
Rows: 86 Columns: 6289
── Column specification ───────────────────────────
dbl (6289): ...1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
user system elapsed
5.123 19.585 23.369
> system.time(test2 <- read.csv("47361049")) # Base case.
user system elapsed
1.140 0.242 1.383
> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)
... ...
time zone: Australia/Sydney
tzcode source: system (glibc)
attached base packages:
stats graphics grDevices utils datasets methods base
other attached packages:
readr_2.1.5
?read_delim's Description section is too brief and uninformative. Can you at least run it and reproduce it?