unconf15
unconf15 copied to clipboard
Random access/queriable serialization format for R objects
Serialized R objects are everywhere, from cluttering our workspaces to provided package data. Currently, however, such objects are "all or nothing", in that to get any piece of the saved object, or to even determine what objects are saved in a particular rda/RData file, we have to load the whole thing into memory.
It would be nice to have a serialization format amenable to to inspection and "random" - in the access sense - subset retrieval.
Packages such as bigmemory offer something like this for matrices, but I'm talking about a general solution which could act as a swap-in replacement for save().
Self-describing data formats such as Avro https://avro.apache.org/ and some form of external indexing akin to tabix are two approaches that seem promising. Packages such as BigMemory