fst Write multiple fst in parallel and rename may produce corrupted files

The following code is a minimal example of what I do to generate many data.table and store them in fst in parallel. To ensure that all fst files written on disk are complete, I first write data to a .fst.tmp file and then rename it to .fst file.

If things work normally, all fst files written to disk are expected to be complete and correct, which is true for most cases. But occasionally, especially when the CPU load is very high during the generating process, a proportion of fst files may be somehow corrupted, i.e., when I call fst::read_fst to read those fst files, the R session crashes with memory not mapped.

However, the corruption case is hard to reproduce but I have already encountered this multiple times yet it's still hard to see the cause and I'm not even sure whether the problem lies in fst or future, or there's some fundamental unsafety of such workflow.

library(future)
library(data.table)

ids <- 1:1000
plan(multisession, workers = 20, gc = TRUE, earlySignal = TRUE)

dir_output <- "~/output"
dir.create(dir_output, showWarnings = FALSE)

future.apply::future_lapply(ids, function(id) {
  filename <- sprintf("%d.fst", id)
  
  outfile <- file.path(dir_output, filename)
  if (file.exists(outfile)) return(NULL)
  
  cat(id, "\n")

  # compute a big data.table here
  data <- data.table(id = 1:200000)
  for (i in 1:200) {
    data[, paste0("x", i) := runif(.N)]
  }

  tmpfile <- paste0(outfile, ".tmp")
  fst::write_fst(bars, tmpfile, compress = 100)
  file.rename(tmpfile, outfile)
})

As I remember in the log, the corruption cases are file.rename returns FALSE at last, but the file system shows that the .fst.tmp file is indeed renamed to .fst physically. Therefore I suspect that the rename system call fails but leave the file renamed rather than rollback.

Sep 14 '19 02:09 renkun-ken

Hi @renkun-ken, thanks for reporting the problem!

From, your example, it's a bit hard to see what's going on exactly. If date is identical for two or more threads, aren't you trying to write data to the same outfile from multiple workers? If so, that will definitely pose problems, because file.rename() will surely fail if the target is still opened and being written to.

(please correct me if that's not what your sample is showing)

Sep 14 '19 22:09 MarcusKlik

Sorry, date should be id. I just copy and simplify my real code. Each id corresponds to a file.

Sep 15 '19 00:09 renkun-ken

Thanks, yes, for these kind of problems it's very difficult to pinpoint the exact cause. I've run the code quite a few times now, but didn't get any crashes or FALSE's returned from the file.rename(), but that might be just statistics off course. What kind of disk are you using, is it a local drive or a network drive?

Perhaps the system tries to rename the file while it's still being written to (some system timing problem). Does the same problem occur when your not using the file.rename() strategy but just write to outfile directly?

To diagnose the corrupt files that you're getting, we could really use some (hidden) method verify_fst() or similar that thoroughly checks all the hashes, jumps, block sizes and the consistency of the metadata. Also, there are safe (but slower) versions of the methods from LZ4 and ZSTD used to decompress data blocks, those could be used in such a method to check the validity of data blocks.

Without better diagnostics, I suspect it might be difficult to find the exact cause of the corrupt files that you get occasionally...

Did you ever get corrupt files when you were only using a single node to write to disk (without a cluster) ?

Thanks @renkun-ken!

Sep 15 '19 21:09 MarcusKlik

fst fst copied to clipboard

Write multiple fst in parallel and rename may produce corrupted files

fst
fst copied to clipboard