fst
fst copied to clipboard
Can't read 0 columns of a fst file
> read.fst(..., columns = character())
Error in res$resTable[[1]] : subscript out of bounds
If you're wondering why I'd want to do this, it's that I just wanted to compute how many rows are in a large file as fast as possible.
@Kodiologist, if that is the sole reason for that, I guess you can simply use fst::metadata_fst()
(you can try on the example code)
fst::metadata_fst(fst_file)$nrOfRows
The documentation linked above is not too explicit about this, but you can check the str
ucture of the the classed list returned by metadata_fst()
via
str(fst::metadata_fst(fst_file))
Alternatively, you can also use fst::fst()
ft <- fst::fst(fst_file)
nrow(ft)
which is however an indirect method based on fst::metadata_fst()
and may add some (minor) overhead.
On the other hand, it might be good if fst::read_fst()
would handle 0-length columns
and consistently return a data.frame
with the expected nr of rows and 0 columns.
Oh, I'd neglected the metadata_fst
and fst
functions. Thanks. In that case, I guess there's no real need to make this work, although the error message could probably be better.
Hi @Kodiologist, thanks for your question! As @riccardoporreca shows in his comment, most metadata can be retrieved from a fst file by applying the usual functions on the corresponding fst table object:
tmp_file <- tempfile(".fst")
# write some data to a fst file
data.frame(X = 1:10) %>%
fst::write_fst(tmp_file)
# get a reference to the fst file store
ft <- fst::fst(tmp_file)
# the number of rows
nrow(ft)
#> [1] 10
# or number of columns
ncol(ft)
#> [1] 1
# column names
colnames(ft)
#> [1] "X"
# row names
rownames(ft)
#> [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
The dev version now returns a zero-column table:
fst::read_fst(tmp_file, character())
#> data frame with 0 columns and 0 rows
Would you prefer to have this function return a zero-column, 10 row table like with a data.frame
?
data.frame(X = 1:10)[, character()]
#> data frame with 0 columns and 10 rows
(the behavior of data.frame
's, data.table
's and tibble
's is not very consistent in this case)
data.frame(X = 1:10)[, character()]
#> data frame with 0 columns and 10 rows
tibble::tibble(X = 1:10)[, character()]
#> # A tibble: 10 x 0
data.table::data.table(X = 1:10)[, .()]
#> Null data.table (0 rows and 0 cols)
Thanks for the tips. My impression is that returning a 0-column, n-row data frame (or data table) is most logical, because it's consistent with the usual case of selecting nonzero columns.
Hi @Kodiologist, like with data.table
, fst
now returns a 0 by 0 table when an empty column vector is selected:
tmp_file <- tempfile(fileext = "fst")
# write sample fst file
data.frame(
X = sample(sample(1:100, replace = TRUE))
) |>
fst::write_fst(tmp_file)
fst::read_fst(tmp_file, character(0))
#> data frame with 0 columns and 0 rows
hope that works for you, if there is a use case where reporting on the number of rows is important, please reopen this issue, thanks!