readr icon indicating copy to clipboard operation
readr copied to clipboard

`read_delim_chunked` takes much more memory than expected?

Open timothy-barry opened this issue 2 years ago • 6 comments

I am using the read_delim_chunked function to process large text files chunk-by-chunk. My expectation is that memory is cleared after each chunk is read. However, this does not seem to be the case. The amount of memory required to read the text file (by chunking) is the same as the amount of memory to read the text file (without chunking). I assume that this is a bug, but maybe my understanding of read_delim_chunked is incorrect. The purpose of reading by chunk is to conserve memory, right? Thanks!

timothy-barry avatar Jul 10 '22 21:07 timothy-barry

After a bit of searching through the issues on this repo, I noticed that at least one other person seems to be encountering this issue as well: https://github.com/tidyverse/readr/issues/1120#issuecomment-1055255383_.

timothy-barry avatar Jul 10 '22 22:07 timothy-barry

Additional note: this seems to be a more pervasive issue than I had realized. I tried loading a sequence of files via readr::read_delim. R ran out of memory despite the fact that (i) each file itself fits into memory and (ii) I loaded the files 1-by-1.

# readr: runs out-of-memory
for (f in fs) {
  print(paste0("Loading ", f))
  x <- readr::read_delim(file = f,
                         delim = " ",
                         skip = 2,
                         col_types = c("iii"))
  rm(x); gc()
}

I repeated this experiment with data.table's fread function; everything works as expected.

# data.table: everything works
for (f in fs) {
  print(paste0("Loading ", f))
  x <- data.table::fread(file = f,
                         sep = " ",
                         colClasses = c("integer", "integer", "integer"),
                         skip = 2)
  rm(x); gc()
}

As far as I can tell, the current version of readr seems to suffer from more global memory leak issues, unfortunately.

timothy-barry avatar Jul 11 '22 15:07 timothy-barry

I am having the same issue. The memory use increases almost monotonically even though the individual chunks are small.

ben18785 avatar Oct 30 '22 23:10 ben18785

Any updates or workarounds? Can I use edition 1 (via with_edition(1, ...) or local_edition(1)) to resolve this issue, at least for the time being?

timothy-barry avatar Dec 21 '22 03:12 timothy-barry

Having the same problem here.

arthurgailes avatar Jan 25 '23 22:01 arthurgailes

To investigate this issue we'll need a reprex, and some indication of how you're measuring R's memory consumption.

hadley avatar Jul 31 '23 22:07 hadley