fst icon indicating copy to clipboard operation
fst copied to clipboard

Conditional read on a fst file

Open MarcusKlik opened this issue 8 years ago • 1 comments

By specifying a condition on one or more columns of the stored table, data can be read using far less memory than a full read combined with a selection of rows. Related to issue #15 and issue #16: data can be read using a stream object and selection can be done on chunks of data, rather than the complete data set. Restrictions:

  • Condition cannot contain aggregate statements that depend on the whole set, e.g. median(ColA) / sum(ColA).
  • Size of result is not known in advance, so a binding of smaller result sets is required (like data.table's rbindlist). This will have an effect on performance.

MarcusKlik avatar Feb 28 '17 20:02 MarcusKlik

On the other hand, because we read in separate chunks anyway, a conditional read feature is well suited for a multi-threaded implementation, provided we can implement the conditional statements in C++.

MarcusKlik avatar Feb 28 '17 20:02 MarcusKlik