Segfault encountered with coop::pcor()
Greetings!
I just encountered a segfault for the first time while attempting to construct a correlation matrix for a dataset with dimensions 52,232 x 32.
I'm running the coop with R 4.0.2 via conda on a Linux machine.
The system has a decent amount of memory (128G). coop doesn't appear to reach anything near that limit before seg-faulting.
I've used coop::pcor() for a number of other datasets and never encountered an issue before, so it seems to be something that occurs quite infrequently.
Stack-trace:
*** caught segfault ***
*** caught segfault ***
*** caught segfault ***
*** caught segfault ***
address 0x7fb9c71d11f0, cause 'memory not mapped'
*** caught segfault ***
*** caught segfault ***
address 0x7fb9c71ce670, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71dbfb8, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71daa08, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71deb18, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d7ea8, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71cf150, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d4850, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d0710, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71db4e0, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d9458, cause 'memory not mapped'
address 0x7fb9c71d9f30, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71de040, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71e1678, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d1cd0, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d5e10, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d8980, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71d68f0, cause 'memory not mapped'
*** caught segfault ***
address 0x7fb9c71e0ba0, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...
Warning: stack imbalance in '$', 34 then 30
Traceback:
1: co_matrix(x, y, CO_ORR, use, inplace, trans = FALSE, inverse = inverse)
2: pcor.matrix(t(dat), use = "pairwise.complete")
3: coop::pcor(t(dat), use = "pairwise.complete")
4: eval(expr, envir, enclos)
5: eval(expr, envir, enclos)
6: withVisible(eval(expr, envir, enclos))
7: withCallingHandlers(withVisible(eval(expr, envir, enclos)), warning = wHandler, error = eHandler, message = mHandler)
8: handle(ev <- withCallingHandlers(withVisible(eval(expr, envir,
*** caught segfault ***
address 0x7fb9c71dd568, cause 'memory not mapped'
I checked the core dump, but didn't see notice anything more informative, only that the signal was "11 (SEGV)"..
Let me know if there is any additional information I can help provide.
sessionInfo():
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Arch Linux
Matrix products: default
BLAS/LAPACK: /mnt/storage/conda/envs/snakes/lib/libopenblasp-r0.3.10.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] coop_0.6-2
loaded via a namespace (and not attached):
[1] compiler_4.0.2 parallel_4.0.2
Thanks for the report. That matrix is only ~12 MB so I don't think there's any issue there. I'm not able to replicate by running a quick test:
x = matrix(rnorm(52232*32), ncol=32)
str(coop::pcor(x))
## num [1:32, 1:32] 1 0.000847 -0.007405 -0.001866 -0.003212 ...
If I run that script through valgrind, I see no errors. My guess is that it's a bug in OpenBLAS. I've seen several recently, so it's not out of the realm of impossibility. Could you try testing with a different BLAS library?
Edit: Just checked and I'm using OpenBLAS 0.3.8 in my test.
You might also try with 1 OMP thread just to see what happens. You can launch R from the command line via
OMP_NUM_THREADS=1 R
or you could use something like this https://github.com/wrathematics/openblasctl
Hi @wrathematics
Thanks for the quick response and suggestions!
So I am actually attempting to compute correlation matrices for both x and t(x), so memory may be a bigger issue for the later.
I still would not expect the operation to consume more than the system's 128G, however, attempting to use the base cor() function also fails, so perhaps I am missing some overhead in my calculations?
Even so, segfaults are obviously never desirable, so it seems like this still may be an issue even in the case where there is no hope of it succeeding on a given system.
I tried using a single OMP thread via both approaches (OMP_NUM_THREADS=1 and openblasctl), but the result is the same for both.
Thanks, the transpose is the problem. There was an indexing issue that I've corrected for the pairwise complete case in the latest commit. I'll fix the remaining as I get time.
Great! Thanks for the quick fix! I'll do some testing with the pairwise complete case and let you know if I come across any more issues.