fst
fst copied to clipboard
Lightning Fast Serialization of Data Frames for R
The new `arrow` package is now online, so can add `arrow::read/write_parquet/arrow`. I don't mind contributing to this but I will need to finish off a couple of CRAN submissions.
I'm not sure if it makes sense that `read_fst` automatically reads a file as a `data.table` if it is written from a `data.table`. I'll certainly love this behavior as if...
It looks like the conda-forge package for `fst` has not been able to successfully build on OSX since they [upgraded their dependencies back in July](https://github.com/conda-forge/r-fst-feedstock/pull/5#issuecomment-513542114), and there isn't any action...
And add a section on benchmarking on https://fstpackage.github.io
The conversion needs very little memory, as we can use the `rbind` functionality of `fst` to append chunks from the `csv` file. The resulting `fst` file would have random row...
This would reduce the memory footprint of writing to a `fst` file. It is also possible to use a `parLapply` approach, where data is generated in a parallel, but serialized...
By setting the first parameter to a vector of file names.
The current version reads in factors to R, even if I select only 1000 rows. That means that ALL factor levels are read in, and this extremely memory inefficient. An...
That would significantly reduce the overhead when these columns are selected, especially when they are selected in the order in which they were stored
Library [asmlib](http://www.agner.org/optimize/#asmlib) contains optimized C++ code for common string methods (`strlen`, `strcopy`, `strcmp`) which are also used in `fst` (mainly for serialization of `character` columns).