fst icon indicating copy to clipboard operation
fst copied to clipboard

How to extract contents from a fst file when R crashes reading it

Open gabowi opened this issue 3 years ago • 2 comments

Hey everyone,

first of all thank you for this package, which is quite helpful in our work. For the first time after writing and reading a lot files already, I now experience a problem.

Trying to read a 12 GB fst file (using: read_fst(path_fstfile)), R crashes. The error message is: "R Session Aborted. R encountered a fatal error. The session was terminated."

This can be reproduces on different computers and from different sources (network, local drive). It is independent from whether data.table is loaded as well or not. It is furthermore independent from whether the script is called through RStudio or through the command line using Rscript.exe. There is sufficient memory available (more than 100 GB RAM). Other fst files can be read successfully.

metadata_fst() works well on this file (see output below).

Is there any method to retrieve the contents of this file?

Thank you in advance for your help. Gabriel

> metadata_fst(path_fstfile)
<fst file>
120534568 rows, 43 columns (demandsimulationResult.fst)

* 'tripId'                   : integer
* 'legId'                    : integer
* 'personnumber'             : integer
* 'householdOid'             : integer
* 'personOid'                : integer
* ....

Note: other columns are of type character, double and logical.

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fst_0.9.4

loaded via a namespace (and not attached):
[1] compiler_4.1.0 parallel_4.1.0 tools_4.1.0    Rcpp_1.0.7 

Note: On another computer with R version 4.1.2 the error occurs as well.

gabowi avatar Nov 24 '21 14:11 gabowi

Have you tried incrementally reading parts of the file? E.g.

read_fst(path_fstfile, from=1, to=100)
read_fst(path_fstfile, from=100, to=1000)
read_fst(path_fstfile, from=120534468)

fox34 avatar Dec 21 '21 17:12 fox34

Hi @gabowi, did you check your memory consumption while the fst file is loading from disk? This sounds like your system doesn't have enough memory to read this file but that shouldn't crash R. Were the partial reads suggested by @fox34 successful?

MarcusKlik avatar Nov 16 '22 11:11 MarcusKlik