readr icon indicating copy to clipboard operation
readr copied to clipboard

potential bug crashes RStudio when reading CSV file

Open pachadotdev opened this issue 3 years ago • 1 comments

As I explained on https://stackoverflow.com/questions/75657380/readr-vs-data-table-different-results-on-fedora?noredirect=1#comment133483471_75657380

I replaced Ubuntu 20.04 with Fedora 37 on my laptop (clean install, 16 GB RAM) to follow my lab's standard and, curiously, readr doesn't work with a 6.7 GB csv file in this specific case (it crashes RStudio). What can explain this? readr worked with Ubuntu.

library(archive)

url <- "https://www.usitc.gov/data/gravity/itpd_e/itpd_e_r02.zip"
zip <- gsub(".*/", "", url)

if (!file.exists(zip)) {
  try(download.file(url, zip, method = "wget", quiet = T))
}

if (!length(list.files(getwd(), pattern = "ITPD_E_R02\\.csv")) == 1) {
  archive_extract(zip, dir = getwd())
}

# this will crash RStudio
# trade <- readr::read_csv("/ITPD_E_R02.csv")

# this won't
trade <- data.table::fread("/ITPD_E_R02.csv")

free memory

$ free -m
               total        used        free      shared  buff/cache   available
Mem:           15699        4332        1106        1032       10259        9957
Swap:           8191           9        8182

session info

 sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 37 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.3

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] data.table_1.14.8 readr_2.1.4       archive_1.1.5    

loaded via a namespace (and not attached):
 [1] fansi_1.0.4       tzdb_0.3.0        utf8_1.2.3        R6_2.5.1          lifecycle_1.0.3  
 [6] magrittr_2.0.3    pillar_1.8.1      rlang_1.0.6       cli_3.6.0         rstudioapi_0.14  
[11] ellipsis_0.3.2    vctrs_0.5.2       tools_4.2.2       glue_1.6.2        hms_1.1.2        
[16] compiler_4.2.2    pkgconfig_2.0.3   CoprManager_0.5.0 tibble_3.1.8

pachadotdev avatar Mar 07 '23 20:03 pachadotdev

Somewhat more minimal reprex:

library(archive)

url <- "https://www.usitc.gov/data/gravity/itpd_e/itpd_e_r02.zip"
path <- tempfile()

ua <- "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"
curl::multi_download(url, path, useragent = ua)

trade <- readr::read_csv(path, lazy = TRUE)

But this doesn't crash for me.

hadley avatar Jul 31 '23 22:07 hadley