Creating fst from existing data
Is there any way to create and fst file from existing data? I have a file that has the data but let's say it's too big for me to read into R, is there any way I can make it into an FST so I can read the data in R?
Have a look at #178, does this help?
Hi @js430, thanks for your question!
As @zauster suggests, some of the solutions in #178 might help you. In your case, you have a (I'm guesing) csv file that you cannot read into RAM in one go, right?
So your best option is to split the file in several smaller files and then convert those to fst files. The splitting can be done with one of many csv splitter tools that are available (current options to split csv files from R are limited). Once you have the set of fst files, you split your analysis column wise by using fst to selectively load the columns that you need:
# generate several csv files
csv_files <- sapply(1:7, function(x) {
csv_file <- paste0(x, ".csv")
data.table::fwrite(data.frame(X = 1:1e6, Y = 1e6:1), csv_file)
csv_file
})
# method to convert a set of csv files to fst files
csv_to_fst <- function(csv_files, compress = 50) {
sapply(csv_files, function(csv_file) {
fst::write_fst(data.table::fread(csv_file), paste0(csv_file, ".fst"))
})
paste0(csv_files, ".fst")
}
# convert the generated files
fst_files <- csv_to_fst(csv_files)
# method to read data from multiple fst files
read_fst_files <- function(fst_files, columns) {
data.table::rbindlist(
lapply(fst_files, fst::read_fst, columns = columns)
)
}
# read a single column from multiple fst files
read_fst_files(fst_files, "X")
#> X
#> 1: 1
#> 2: 2
#> 3: 3
#> 4: 4
#> 5: 5
#> ---
#> 6999996: 999996
#> 6999997: 999997
#> 6999998: 999998
#> 6999999: 999999
#> 7000000: 1000000
I hope that helps!