fst
fst copied to clipboard
Convert rds to fst
Hello,
I have a large number of rds files (each file>100MB) that I'm trying to work with. I just got familiar with your fst package which looks promising for my needs, since I need to read a certain column each time and not necessarily the whole file. Since this is not something I can do with the rds format, I wondered if there is a way to convert the files from rds format to fst format.
Looking forward to your response
Thank you!
fst
is only able to write data.frame
like data. If your rds files are all data.frame
, you can simply read them into your computer's memory and write to fst files. Otherwise, I don't think it's possible. Some illustration codes:
for (file in files) {
tmp <- readRDS(file)
fst::write_fst(tmp, fst_file)
}
Hi @mshamer, thanks for submitting your issue!
@shrektan is quite right, at the moment, it's only possible to read your rds
file as a whole before writing it to fst
format. But it would certainly make an interesting feature if it would be possible to serialize a table stored in the rds
format to the fst
format column by column.
The advantage would be that such a conversion would (only) cost an amount of memory equal to the largest column in terms of memory size. We would have to study the rds
format more closely to see if it's straightforward to read the table one column at a time.
At first glance (see for example here) the rds
format is a recursive format, so reading one R object at a time should be possible through the R
api.
Thanks for submitting your feature request!
Thank you very much!
I appreciate your prompt responses
Best,
M
Meytar Sorek-Hamer, PhD NPP Research Fellow (USRA) NASA Ames Research Center Building 245, Room: 280L Moffett Field, CA 94035 USA ph: 650-604-0153 cell: 669-264-8000
On Mon, Apr 16, 2018 at 4:29 AM, Mark Klik [email protected] wrote:
Hi @mshamer https://github.com/mshamer, thanks for submitting your issue!
@shrektan https://github.com/shrektan is quite right, at the moment, it's only possible to read your rds file as a whole before writing it to fst format. But it would certainly make an interesting feature if it would be possible to serialize a table stored in the rds format to the fst format column by column.
The advantage would be that such a conversion would (only) cost an amount of memory equal to the largest column in terms of memory size. We would have to study the rds format more closely to see if it's straightforward to read the table one column at a time.
At first glance (see for example here https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Serialization-Formats) the rds format is a recursive format, so reading one R object at a time should be possible through the R api.
Thanks for submitting your feature request!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fstpackage/fst/issues/145#issuecomment-381567807, or mute the thread https://github.com/notifications/unsubscribe-auth/AkowO3p-sizPOtg4uFF52-tuyLTOKhY-ks5tpIB-gaJpZM4TV43E .