fsttable icon indicating copy to clipboard operation
fsttable copied to clipboard

rbind/cbind two fsttable object or a fsttable with a data.frame or data.table

Open akrun1 opened this issue 5 years ago • 3 comments

I would like to rbind two fsttable objects or a single fsttable with data.frame. What would be the preferred method?

library(fsttable)
library(data.table)
ft1 <- fst_table('1.fst')
rbindlist(list(ft1, ft1[1:10]))
.table_proxy X Y
1: <tableproxy[2]> 0 0
2: <tableproxy[2]> 0 0

For creating a new column/updating, I tried

 ft1[1:4, .(X)] *4
    X
1:  4
2:  8
3: 12
4: 16

If I update based on data.table methods, it is resulting in error

new <- (ft1[1:4, .(X)] * 4)[[1]]
ft1[1:4, new := new]
Error in parse_j(j, tbl_proxy$remotetablestate$colnames, parent.frame()) : 
  j must be a list

Is there a preferred method for modifying/updating columns? I did read some previous issues here and here. I just wonder if there are any updates for that. Thanks

PS: My objective is to update an already loaded fsttable object without converting to data.frame/data.table, add new rows and write it back as .fst file (after doing some join operations)

akrun1 avatar Aug 14 '20 16:08 akrun1

Tried comparing the read efficiency as well as select/subset between fsttable and tidyft. Both read the dataset (.fst) (10328208 x 35) very efficiently, but it is the later steps that is costly in tidyft. If there are ways in fsttable to do this efficiently, it would be great.

Screen Shot 2020-08-15 at 12 17 22 AM

akrun1 avatar Aug 15 '20 05:08 akrun1

Hi @akrun1, thanks for your feature request!

At the moment fsttable does not have rbindlist or cbind functionality unfortunately as it is in it's first experimental stages (and not actively developed at the moment). But it would certainly be a requirement for a fully functional data.table interface.

thanks, I'll add your issue as a feature request!

MarcusKlik avatar Aug 18 '20 17:08 MarcusKlik

@MarcusKlik Thank you for the reply. I tried some of the packages (tidyft, arrow and disk.frame). One of the main advantages with your package fsttable is that it is so fast with slicing. With tidyft, as soon as I use select_fst and do some operations, it loses the advantage because it is pulling the data into memory. With disk.frame, I split up the data into multiple csv file, but it still takes a lot of time to read the data and put that into .fst files.

akrun1 avatar Aug 18 '20 22:08 akrun1