fst icon indicating copy to clipboard operation
fst copied to clipboard

Progress bar when read/write

Open matthewgson opened this issue 3 years ago • 1 comments

Thank you for creating this awesome package, and it has been my go-to package whenever I save big files on disk. I hope to see the progressbar when I read/write big file. Is there a plan for implementing a simple progress bar option when reading /writing fst file in the future?

matthewgson avatar Jul 22 '21 12:07 matthewgson

Hi @matthewgson, thanks for your request!

A progress bar would be very nice, but the actual call to the fstlib C++ library doesn't come back after the complete file has been read. We could create a hook to call from fstlib to update a progress bar, but that seems like overkill (and would add more dependencies to the fst package).

If you want feedback when reading very large files, you could read chunks and update a progress bar after each chunk, would that work for you?

library(dplyr)
library(fst)
library(progress)

# function to read and show progress
read_fst_progress <- function(path, columns) {

  nr_of_rows <- metadata_fst(path)$nrOfRows

  # determine chunks
  nr_of_chunks <- 100
  chunk_size <- 1 + (nr_of_rows - 1) %/% nr_of_chunks  # take partial chunks into account

  pb <- progress_bar$new(total = 100)

  lapply(1:nr_of_chunks, function(chunk) {

    pb$tick()
    Sys.sleep(0.1)  # remove this line!!!

    y <- read_fst(
      tmp_file,
      columns = columns,
      from = 1 + (chunk - 1) * chunk_size,
      to = min(chunk * chunk_size, nr_of_rows)
    )
  }) %>%
    bind_rows
}

# write sample fst file
tmp_file <- tempfile(fileext = "fst")
nr_of_rows <- 1e6
data.frame(
  X = sample(sample(1:100, nr_of_rows, replace = TRUE)),
  Y = LETTERS[sample(1:26, nr_of_rows, replace = TRUE)]
) %>%
  write_fst(tmp_file)

y <- read_fst_progress(tmp_file)

#> [===========================================================>------------------------------------------]  59%

MarcusKlik avatar Nov 16 '22 12:11 MarcusKlik