fst icon indicating copy to clipboard operation
fst copied to clipboard

Convert rds to fst

Open mshamer opened this issue 6 years ago • 3 comments

Hello,

I have a large number of rds files (each file>100MB) that I'm trying to work with. I just got familiar with your fst package which looks promising for my needs, since I need to read a certain column each time and not necessarily the whole file. Since this is not something I can do with the rds format, I wondered if there is a way to convert the files from rds format to fst format.

Looking forward to your response

Thank you!

mshamer avatar Apr 16 '18 05:04 mshamer

fst is only able to write data.frame like data. If your rds files are all data.frame, you can simply read them into your computer's memory and write to fst files. Otherwise, I don't think it's possible. Some illustration codes:

for (file in files) {
    tmp <- readRDS(file)
    fst::write_fst(tmp, fst_file)
}

shrektan avatar Apr 16 '18 07:04 shrektan

Hi @mshamer, thanks for submitting your issue!

@shrektan is quite right, at the moment, it's only possible to read your rds file as a whole before writing it to fst format. But it would certainly make an interesting feature if it would be possible to serialize a table stored in the rds format to the fst format column by column.

The advantage would be that such a conversion would (only) cost an amount of memory equal to the largest column in terms of memory size. We would have to study the rds format more closely to see if it's straightforward to read the table one column at a time.

At first glance (see for example here) the rds format is a recursive format, so reading one R object at a time should be possible through the R api.

Thanks for submitting your feature request!

MarcusKlik avatar Apr 16 '18 11:04 MarcusKlik

Thank you very much!

I appreciate your prompt responses

Best,

M

Meytar Sorek-Hamer, PhD NPP Research Fellow (USRA) NASA Ames Research Center Building 245, Room: 280L Moffett Field, CA 94035 USA ph: 650-604-0153 cell: 669-264-8000

On Mon, Apr 16, 2018 at 4:29 AM, Mark Klik [email protected] wrote:

Hi @mshamer https://github.com/mshamer, thanks for submitting your issue!

@shrektan https://github.com/shrektan is quite right, at the moment, it's only possible to read your rds file as a whole before writing it to fst format. But it would certainly make an interesting feature if it would be possible to serialize a table stored in the rds format to the fst format column by column.

The advantage would be that such a conversion would (only) cost an amount of memory equal to the largest column in terms of memory size. We would have to study the rds format more closely to see if it's straightforward to read the table one column at a time.

At first glance (see for example here https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Serialization-Formats) the rds format is a recursive format, so reading one R object at a time should be possible through the R api.

Thanks for submitting your feature request!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fstpackage/fst/issues/145#issuecomment-381567807, or mute the thread https://github.com/notifications/unsubscribe-auth/AkowO3p-sizPOtg4uFF52-tuyLTOKhY-ks5tpIB-gaJpZM4TV43E .

mshamer avatar Apr 16 '18 22:04 mshamer