fst icon indicating copy to clipboard operation
fst copied to clipboard

update metadata from r

Open nolanp2 opened this issue 6 years ago • 2 comments

It would be nice to have a way of updating meta data like column names on a file, without having to overwrite the whole dataset. For example, a feature that lets you call

fst.metadata(fst_data)$columnNames <- paste0(fst.metadata(fst_data)$columnNames,"_updated")

(currently gives an error)

Maybe there's already a way to do this, but I can't find it.

nolanp2 avatar Aug 23 '18 13:08 nolanp2

Hi @nolanp2, thanks for your request!

Yes, it would be definitely nice to have a method like dplyr's rename() or data.table's setnames() to change the stored column names in the fst file to new values!

Currently that's not possible yet, but there are several other features planned that will also need to update data in the (now immutable) fst file (such as row- and column- binding). The format is prepared to overwrite blocks of data with new ones, even if they are larger than the original (e.g. longer column names).

As a first step, the new column names block could overwrite the current column names block and any extra bytes needed can be added to the end of the fst file (in an additional data block requiring 2 extra file seeks when loading). Obviously, when many such operations are performed, the number of extra file seeks can slow down loading, but the effect will be very small on modern SSD disks.

Thanks for submitting your feature request!

MarcusKlik avatar Aug 23 '18 21:08 MarcusKlik

Sounds good, I'll look forward to it. Excellent package by the way!

nolanp2 avatar Aug 24 '18 12:08 nolanp2