fst issues

Add benchmark vs `arrow::read/write_parquet`

2

The new `arrow` package is now online, so can add `arrow::read/write_parquet/arrow`. I don't mind contributing to this but I will need to finish off a couple of CRAN submissions.

xiaodaigh

feature request

read_fst should automatically read as data.table if written from data.table

7

I'm not sure if it makes sense that `read_fst` automatically reads a file as a `data.table` if it is written from a `data.table`. I'll certainly love this behavior as if...

renkun-ken

feature request

Error in conda forge build

11

It looks like the conda-forge package for `fst` has not been able to successfully build on OSX since they [upgraded their dependencies back in July](https://github.com/conda-forge/r-fst-feedstock/pull/5#issuecomment-513542114), and there isn't any action...

brendanf

question

Benchmark on well-known data sets

3

And add a section on benchmarking on https://fstpackage.github.io

MarcusKlik

Convert a csv file directly to a fst file

6

The conversion needs very little memory, as we can use the `rbind` functionality of `fst` to append chunks from the `csv` file. The resulting `fst` file would have random row...

MarcusKlik

feature request

Write to fst binary file with an apply-like method

This would reduce the memory footprint of writing to a `fst` file. It is also possible to use a `parLapply` approach, where data is generated in a parallel, but serialized...

MarcusKlik

feature request

read.fst reads multiple fst files into a single data set

By setting the first parameter to a vector of file names.

MarcusKlik

feature request

Option strings as factors=FALSE

1

The current version reads in factors to R, even if I select only 1000 rows. That means that ALL factor levels are read in, and this extremely memory inefficient. An...

pmakai

feature request

Adjacent column with identical types are stored as a matrix internally

1

That would significantly reduce the overhead when these columns are selected, especially when they are selected in the order in which they were stored

MarcusKlik

enhancement

Use asmlib library for common C++ string methods

Library [asmlib](http://www.agner.org/optimize/#asmlib) contains optimized C++ code for common string methods (`strlen`, `strcopy`, `strcmp`) which are also used in `fst` (mainly for serialization of `character` columns).

MarcusKlik

enhancement

fst
fst copied to clipboard

Metadata

Add benchmark vs `arrow::read/write_parquet`

read_fst should automatically read as data.table if written from data.table

Error in conda forge build

Benchmark on well-known data sets

Convert a csv file directly to a fst file

Write to fst binary file with an apply-like method

read.fst reads multiple fst files into a single data set

Option strings as factors=FALSE

Adjacent column with identical types are stored as a matrix internally

Use asmlib library for common C++ string methods

← Metadata

Owner

Metadata

fst fst copied to clipboard

Metadata

← Metadata

Owner

Metadata

fst
fst copied to clipboard