fst
fst copied to clipboard
Fill a data.table range with specific rows from read.fst
With this feature you can populate say row 1001:2000 in a 1e6 row data.table
with a 1000 row read from fst.read
. All this is done in memory. This feature is very useful for combining data from multiple (fst
) sources into a single result table without having the overhead of copies. For example, when performing the merge sort algorithm on a set of data files, you need to
- read first x rows from all files
- sort the resulting table
- write some rows to disk
- read next x rows form file with smallest first chunk
- sort resulting table
- goto 3
This can be performed efficiently in R by using data.table
's fast sorting and populating the result table in memory. With such an algorithm operating on a collection of fst
files, we basically have a method of sorting arbitrary large fst
files without running out of memory (and it can be done with multiple threads!).