fsttable Feature request: conditional select doesn't work

 library(fsttable)
  nr_of_rows <- 5
  x <- data.table::data.table(X = 1:nr_of_rows, Y = LETTERS[1 + (1:nr_of_rows) %% 26])
  fst::write_fst(x, "1.fst")
  ft <- fst_table("1.fst")
  ft
  ft[X==1] # this doesn't work

really like this package thanks!

Sep 25 '19 16:09 ssh352

Hi @ssh352, thanks for reporting the issue!

yes, expressions in the i argument are not implemented yet. With fsttable the challenge is to minimize RAM usage for an expression

ft[i, j, by = "z"].

A first strategy might be:

Scan arguments i, j and by for column names
Load columns needed by selector i and read them
Calculate the selector i (as integer or logical vector)
Discard all loaded columns not needed by j or by
Load (extra) columns needed for the by argument
Apply selector (i) after reading each column
Calculate grouping mask from by argument
Discard loaded columns not needed by j
Load (extra) columns needed by j
Apply selector (i) after reading each column
Process j taking into account the grouping mask (from by)

If needed, columns needed in more than one place can be discarded and reloaded to save RAM.

In a second stage, the combination of the grouping mask and the selector can be supplied to the underlying fst library, to do efficient reading. That way, we don't need to load complete columns from disk and the grouping is done automatically saving memory.

In a third stage, j can be applied to smaller (vertical) chunks of the dataset by loading only a few groups (by argument) at a time.

In a later stage, this processing of chunked datasets should be possible during background loading of the next chunk, to further increase performance.

This was just to order my thoughts, thanks again for submitting the issue!

Sep 26 '19 13:09 MarcusKlik

@MarcusKlik Hi Marcus, Is there any updates regarding the group by functionality. I have a very big fst file, which I loaded with fsttable and it was very fast. Now, the challenge involves some merging and by operations with normal data.tables. I was thinking of extracting the relevant columns. The grouping and merge functionalities would be very useful

Aug 14 '20 02:08 akrun1

Hi @akrun1, thanks for your request. Unfortunately, the group-by functionality is not implemented yet. If reading the fst file get's you into memory problems, your suggestion to read selections of columns before each grouping is probably your best option!

thanks

Aug 18 '20 18:08 MarcusKlik