BPCells icon indicating copy to clipboard operation
BPCells copied to clipboard

`BPCells::svds` crashes for `threads > 1`

Open Artur-man opened this issue 8 months ago • 3 comments

Hello,

My R sessions crashes if I input threads here in svds more than 1 as integer. Is this supposed to be the proper way of using it.

svd <- BPCells::svds(normdata, k=30, threads = 2L)
pr.data <- BPCells::multiply_cols(svd$v, svd$d)

Artur-man avatar Apr 22 '25 19:04 Artur-man

Hi @Artur-man , Thanks again with using BPCells! I'm unable to reproduce the crash on my end. Can you give a little more information on the data being used in normdata, and any information on your system setup?

I'm succeeding with a small matrix on my end.

mat <- get_demo_mat()
mat <- log1p(multiply_cols(mat, 1/Matrix::colSums(mat))) %>% write_matrix_dir(tempdir(), overwrite = TRUE)
svd <- svds(mat, k = 30, threads = 2L)
multiply_cols(svd$v, svd$d)

immanuelazn avatar Apr 22 '25 20:04 immanuelazn

Hey @immanuelazn,

I also did a quick check with write_matrix_dir and it appears this specifically happens for write_matrix_hdf5. Also, I think the problem also occurs with dummy data.

So, this works:

library(BPCells)
m <- matrix(data = seq_len(100*100), nrow=100) |>
  as("IterableMatrix")
file <- tempdir()
m <- write_matrix_dir(m, dir = file)
svd <- BPCells::svds(m, k=30, threads=2L)

but this doesn't work and crashes the R session:

library(BPCells)
m <- matrix(data = seq_len(100*100), nrow=100) |>
  as("IterableMatrix")
file <- tempfile(fileext = ".h5")
m <- write_matrix_hdf5(m, path = file, group = "name")
svd <- BPCells::svds(m, k=30, threads=2L)

Here is the log of the crash when I run R on terminal:

 *** caught segfault ***

 *** caught segfault ***
address 0x1, cause 'invalid permissions'
address 0x1, cause 'invalid permissions'
Error in svds_cpp(it, k, solver_params[["ncv"]], solver_params[["maxitr"]],  :
  bad value
Error in svds_cpp(it, k, solver_params[["ncv"]], solver_params[["maxitr"]],  :
  bad value
R(6798,0x1727bf000) malloc: Double free of object 0x19b91e000
R(6798,0x1727bf000) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Artur-man avatar Apr 23 '25 08:04 Artur-man

I've been able to reproduce this on a Mac and have made a few observations:

  • Using the homebrew hdf5 library on MacOS hits the crash
  • Using the BPCells binary from R-universe on MacOS hits the crash
  • Using a custom-compiled HDF5 with the thread safety option turned on works on MacOS without a crash
  • This example does not crash when run on linux, even without a thread safe HDF5 build

My current conclusions are:

  1. This crash is specifically due to HDF5 not being thread safe
  2. Technically we should probably treat HDF5 reads as not thread safe everywhere, even though we've only seen crashes on Macs

Obviously, saving the input file to a non-HDF5 source is the immediate workaround, but in general it should be impossible to cause these kinds of session crashes with BPCells. For the fix, I'd propose:

  1. Make BPCells check if the H5_HAVE_THREADSAFE macro is defined from HDF5 (vast majority of builds will be non threadsafe)
  2. When building against a non-thread-safe HDF5 library, put all access to hdf5 API calls behind a global lock
  3. Ideally, figure out a way to print a performance warning when someone runs a multi-threaded operation with HDF5 file inputs to let them know the data reads are being forced to happen single-threaded.

bnprks avatar Apr 29 '25 07:04 bnprks