Rcpi icon indicating copy to clipboard operation
Rcpi copied to clipboard

Some SMILEs crash the entire R

Open lz100 opened this issue 3 years ago • 3 comments

Some SMILEs break extractDrugLongestAliphaticChain

library(rcdk)
library(Rcpi)
library(magrittr)
"[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%
     parse.smiles() %>% .[[1]] %>%
    extractDrugLongestAliphaticChain()

#> Error: segfault from C stack overflow

Then, if you don't run extractDrugLongestAliphaticChain but run with other random Rcpi functions, the entire R session crashes

"[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%
     parse.smiles() %>% .[[1]] %>%
     extractDrugXLogP()

 *** caught segfault ***
address 0x311000006, cause 'memory not mapped'

Traceback:
 1: .jcheck()
 2: .jcall(dval, "Lorg/openscience/cdk/qsar/result/IDescriptorResult;",     "getValue")
 3: FUN(X[[i]], ...)
 4: lapply(descvals, .get.desc.values, nexpected = length(dnames))
 5: eval.desc(molecules, "org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor",     verbose = !silent)
 6: extractDrugXLogP(.)
 7: "[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]" %>%     parse.smiles() %>% .[[1]] %>% extractDrugXLogP()

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

The first issue may be CDK java issue, but can we do something in the second case to prevent R crash?

lz100 avatar Feb 25 '22 22:02 lz100

I would recommend checking JDK and rJava configurations. Normally, this would return a proper NA:

library(magrittr)

"CCCC" %>%
  rcdk::parse.smiles() %>%
  .[[1]] %>%
  Rcpi::extractDrugLongestAliphaticChain()

#> nAtomLAC
#> 1        4

x <- "[H]OC1=C2OC(=O)C34C5=C6C7([H])C8=C(C([H])([H])C([H])(C79C([H])([H])C5([H])C(=C([H])C([H])(C%10([H])C([H])([H])C([H])([H])C([H])([H])C%10([H])[H])C([H])([H])C4([H])C%11(OC(=O)C=%12C%11=C([H])C([H])=C([H])C%12C([H])([H])C([H])([H])C([H])([H])N([H])[H])C23C([H])([H])C6([H])[H])C([H])([H])C9([H])[H])C([H])([H])[H])C([H])([H])C([H])([H])C%13([H])N8C([H])([H])C%14([H])C%15([H])N(C%16([H])C%17(C([H])([H])C%18(C([H])([H])C%17([H])[H])C([H])([H])C([H])([H])C([H])([H])C%18([H])[H])C([H])([H])C([H])([H])C%15([H])C([H])([H])C1%16[H])C([H])([H])C%13([H])C%14([H])[H]"

x %>%
  rcdk::parse.smiles() %>%
  .[[1]] %>%
  Rcpi::extractDrugLongestAliphaticChain()

#> nAtomLAC
#> 1       NA

If that's really not possible, perhaps wrapping calls with callr could be a workaround to avoid crashing the main process and to allow exception handling. See an example at https://nanx.me/blog/post/disposable-computing-with-callr/.

nanxstats avatar Feb 26 '22 07:02 nanxstats

Thanks for the recommendation. I suspect this is OS related. Both my attempts on Centos 7 and Ubuntu 20 crashed. I wonder if you are using a non-Linux system?

lz100 avatar Feb 26 '22 23:02 lz100

Just tested on macOS with Oracle JDK installed via homebrew cask and Windows 10 with Amazon Corretto JDK installed via chocolatey. They all work ok out of the box.

nanxstats avatar Feb 27 '22 22:02 nanxstats