fst icon indicating copy to clipboard operation
fst copied to clipboard

Serialize any R object and compress in cpp (feature request)

Open dipterix opened this issue 6 years ago • 2 comments
trafficstars

Hi, thanks for the amazing package you have developed. I recently found compress functions in fst package only takes raw vectors as argument. This means if I want to compress an R object in memory, I have to serialize it first and then compress. Would it be possible to take any R object and directly serialize and compress in cpp?

I found this source code written in cpp to serialize R objects in cpp. https://github.com/eddelbuettel/rapiserialize/blob/master/src/serialize.cpp

dipterix avatar Oct 25 '19 17:10 dipterix

You may checkout qs.

renkun-ken avatar Oct 26 '19 01:10 renkun-ken

Hi @dipterix, thanks for your request!

Indeed, as @renkun-ken indicates, package qs is great for serializing generic R objects for you. It uses the same compressors as fst (LZ4 and ZSTD) to speed up compression and also employs a block-compression design to enable multi-threaded compression and serialization.

But if you need random access to your (column-) vectors, then you can only have that with fst. Because fst is centered around data.frame (and equivalent) type of data structures, it can specialize and offer such functionality.

For example, (random access) columns of type list that can contain any type of data structure would certainly be very interesting to have as a feature. To implement that, direct access to R's serialization API is very useful, as we can call R's serialize method on the main thread while compressing results in the background, so thanks a lot for the pointer to rapiserialize!

MarcusKlik avatar Oct 26 '19 20:10 MarcusKlik