fst
fst copied to clipboard
update metadata from r
It would be nice to have a way of updating meta data like column names on a file, without having to overwrite the whole dataset. For example, a feature that lets you call
fst.metadata(fst_data)$columnNames <- paste0(fst.metadata(fst_data)$columnNames,"_updated")
(currently gives an error)
Maybe there's already a way to do this, but I can't find it.
Hi @nolanp2, thanks for your request!
Yes, it would be definitely nice to have a method like dplyr
's rename()
or data.table
's setnames()
to change the stored column names in the fst file to new values!
Currently that's not possible yet, but there are several other features planned that will also need to update data in the (now immutable) fst file (such as row- and column- binding). The format is prepared to overwrite blocks of data with new ones, even if they are larger than the original (e.g. longer column names).
As a first step, the new column names block could overwrite the current column names block and any extra bytes needed can be added to the end of the fst file (in an additional data block requiring 2 extra file seeks when loading). Obviously, when many such operations are performed, the number of extra file seeks can slow down loading, but the effect will be very small on modern SSD disks.
Thanks for submitting your feature request!
Sounds good, I'll look forward to it. Excellent package by the way!