fst icon indicating copy to clipboard operation
fst copied to clipboard

Can't read 0 columns of a fst file

Open Kodiologist opened this issue 3 years ago • 4 comments

> read.fst(..., columns = character())
Error in res$resTable[[1]] : subscript out of bounds

If you're wondering why I'd want to do this, it's that I just wanted to compute how many rows are in a large file as fast as possible.

Kodiologist avatar Mar 05 '21 19:03 Kodiologist

@Kodiologist, if that is the sole reason for that, I guess you can simply use fst::metadata_fst() (you can try on the example code)

fst::metadata_fst(fst_file)$nrOfRows

The documentation linked above is not too explicit about this, but you can check the structure of the the classed list returned by metadata_fst() via

str(fst::metadata_fst(fst_file))

Alternatively, you can also use fst::fst()

ft <- fst::fst(fst_file)
nrow(ft)

which is however an indirect method based on fst::metadata_fst() and may add some (minor) overhead.

On the other hand, it might be good if fst::read_fst() would handle 0-length columns and consistently return a data.frame with the expected nr of rows and 0 columns.

riccardoporreca avatar Mar 08 '21 23:03 riccardoporreca

Oh, I'd neglected the metadata_fst and fst functions. Thanks. In that case, I guess there's no real need to make this work, although the error message could probably be better.

Kodiologist avatar Mar 09 '21 13:03 Kodiologist

Hi @Kodiologist, thanks for your question! As @riccardoporreca shows in his comment, most metadata can be retrieved from a fst file by applying the usual functions on the corresponding fst table object:

tmp_file <- tempfile(".fst")

# write some data to a fst file
data.frame(X = 1:10) %>%
  fst::write_fst(tmp_file)

# get a reference to the fst file store
ft <- fst::fst(tmp_file)

# the number of rows
nrow(ft)
#> [1] 10

# or number of columns
ncol(ft)
#> [1] 1

# column names
colnames(ft)
#> [1] "X"

# row names
rownames(ft)
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

The dev version now returns a zero-column table:

fst::read_fst(tmp_file, character())
#> data frame with 0 columns and 0 rows

Would you prefer to have this function return a zero-column, 10 row table like with a data.frame?

data.frame(X = 1:10)[, character()]
#> data frame with 0 columns and 10 rows

(the behavior of data.frame's, data.table's and tibble's is not very consistent in this case)

data.frame(X = 1:10)[, character()]
#> data frame with 0 columns and 10 rows

tibble::tibble(X = 1:10)[, character()]
#> # A tibble: 10 x 0

data.table::data.table(X = 1:10)[, .()]
#> Null data.table (0 rows and 0 cols)

MarcusKlik avatar Mar 12 '21 22:03 MarcusKlik

Thanks for the tips. My impression is that returning a 0-column, n-row data frame (or data table) is most logical, because it's consistent with the usual case of selecting nonzero columns.

Kodiologist avatar Mar 12 '21 22:03 Kodiologist

Hi @Kodiologist, like with data.table, fst now returns a 0 by 0 table when an empty column vector is selected:

tmp_file <- tempfile(fileext = "fst")

# write sample fst file
data.frame(
  X = sample(sample(1:100, replace = TRUE))
) |>
  fst::write_fst(tmp_file)

fst::read_fst(tmp_file, character(0))
#> data frame with 0 columns and 0 rows

hope that works for you, if there is a use case where reporting on the number of rows is important, please reopen this issue, thanks!

MarcusKlik avatar Nov 16 '22 12:11 MarcusKlik