Mark Klik issues

Results 66 issues of


                                            Mark Klik

A range can be specified with read.fst on sorted data frames

When a sorted data set is stored as a `fst` binary file, sorting metadata is stored alongside the data. Using this metadata, a binary search can be performed on the...

feature request

fst files can be streamed on a row by row basis

That would involve creating a `fst` file-connection object (similar to base-R `file` method). With that object data can be streamed row-by-row until the file is depleted (or the connection is...

feature request

Use a fst file as a container for an ALTREP vector

`R` 3.5.0 brings some features from the `ALTREP` framework. One of those features is that the actual vector data can be stored in an alternative structure or location. Such a...

feature request

Fill a data.table range with specific rows from read.fst

With this feature you can populate say row 1001:2000 in a 1e6 row `data.table` with a 1000 row read from `fst.read`. All this is done in memory. This feature is...

enhancement

write.fst can write multidimensional matrices to file

And provide fast compression with random access to the matix. Check if there is a use-case for such a feature.

feature request

Use boost shared memory to access an in-memory `fst` table from multiple processes

See [here](http://www.boost.org/doc/libs/1_55_0/doc/html/interprocess/sharedmemorybetweenprocesses.html), boost allows for the creation of memory shared between processes. For `fst` that could mean that a single in-memory `fst` table can be shared between different processes. First...

enhancement

Mark Klik

A range can be specified with read.fst on sorted data frames

fst files can be streamed on a row by row basis

Use a fst file as a container for an ALTREP vector

Fill a data.table range with specific rows from read.fst

write.fst can write multidimensional matrices to file

Use boost shared memory to access an in-memory `fst` table from multiple processes

Add accurate timing measurements to multi-threaded reads and writes

Performance measurements for OpenMP constructs

Parallel computation of multiple columns

Fast factorization of character columns