xml2 icon indicating copy to clipboard operation
xml2 copied to clipboard

xml_add_parent produces a segfault in for loop

Open AleKoure opened this issue 3 years ago • 3 comments

By developing a plumber API with xml2 I fall into the following error under a small stress test. I reproduce a minimal example in my local machine.

The following code chunk produces an error,

library(xml2)

xx <- function() {
  x <- read_xml("<fruits><apple color='red'></apple></fruits>")
  xml_add_parent(x, read_xml("<food></food>"))
  print(as.character(x))
}

for(i in 1:1000)xx()
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"

 *** caught segfault ***
address 0x55ff44000000, cause 'memory not mapped'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=el_GR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=el_GR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=el_GR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=el_GR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xml2_1.3.2         plumber_1.1.0.9000

loaded via a namespace (and not attached):
 [1] compiler_4.0.4   magrittr_2.0.1   R6_2.5.0         later_1.2.0     
 [5] promises_1.2.0.1 tools_4.0.4      swagger_3.33.1   Rcpp_1.0.6      
 [9] stringi_1.6.1    jsonlite_1.7.2   webutils_1.1     lifecycle_1.0.0 
[13] rlang_0.4.11    

you can bypass it for example by using xml_add_child and xml_replace instead.

AleKoure avatar May 14 '21 07:05 AleKoure

Hi, I'm also experiencing the problem of R crashing when xml_add_parent is used in combination with other code. As a minimal example it crashes when the code below is run three times. When I originally found the problem I was only calling xml_add_parent once in a script with many other function calls. However, I don't know how to create a minimal example for that I'm afraid.

library(xml2)

# Create XML document
doc <- read_xml("<parent><child1>Hello</child1></parent>")

# Check current elements
children <- xml_children(doc)

new_node <- read_xml('<new_node>New text</new_node>')
xml_add_parent(children, new_node)

# Show that the parent node has been added
doc
#> {xml_document}
#> <parent>
#> [1] <new_node>New text<child1>Hello</child1></new_node>

If I run the above in a loop then it causes R to crash e.g.:

library(xml2)

for (i in 1:3){  
  # Create XML document
  doc <- read_xml("<parent><child1>Hello</child1></parent>")
  
  # Check current elements
  children <- xml_children(doc)
  #expect_equal(xml_text(children), c("Hello"))
  
  new_node <- read_xml('<new_node>New text</new_node>')
  xml_add_parent(children, new_node)
  
  doc

}

reprex produces this:

This reprex appears to crash R. See standard output and standard error for more details.

Standard output and error


*** caught segfault ***
  address 0x5610c8000000, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...

OR this:

This reprex appears to crash R. See standard output and standard error for more details.

Standard output and error

free(): invalid pointer

Thanks to AleKoure for pointing out the workaround and helping me locate which part of my code was crashing my R session.

Created on 2021-06-02 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.4 (2021-02-15)
#>  os       CentOS Linux 8              
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2021-06-02                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  backports     1.2.1   2020-12-09 [2] CRAN (R 4.0.4)
#>  cli           2.4.0   2021-04-05 [2] CRAN (R 4.0.4)
#>  crayon        1.4.1   2021-02-08 [2] CRAN (R 4.0.4)
#>  digest        0.6.27  2020-10-24 [2] CRAN (R 4.0.4)
#>  ellipsis      0.3.1   2020-05-15 [2] CRAN (R 4.0.4)
#>  evaluate      0.14    2019-05-28 [2] CRAN (R 4.0.4)
#>  fansi         0.4.2   2021-01-15 [2] CRAN (R 4.0.4)
#>  fs            1.5.0   2020-07-31 [2] CRAN (R 4.0.4)
#>  glue          1.4.2   2020-08-27 [2] CRAN (R 4.0.4)
#>  highr         0.9     2021-04-16 [2] CRAN (R 4.0.4)
#>  htmltools     0.5.1.1 2021-01-22 [2] CRAN (R 4.0.4)
#>  knitr         1.32    2021-04-14 [2] CRAN (R 4.0.4)
#>  lifecycle     1.0.0   2021-02-15 [2] CRAN (R 4.0.4)
#>  magrittr      2.0.1   2020-11-17 [2] CRAN (R 4.0.4)
#>  pillar        1.6.0   2021-04-13 [2] CRAN (R 4.0.4)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.0.4)
#>  purrr         0.3.4   2020-04-17 [2] CRAN (R 4.0.4)
#>  reprex        2.0.0   2021-04-02 [2] CRAN (R 4.0.4)
#>  rlang         0.4.10  2020-12-30 [2] CRAN (R 4.0.4)
#>  rmarkdown     2.7     2021-02-19 [2] CRAN (R 4.0.4)
#>  sessioninfo   1.1.1   2018-11-05 [2] CRAN (R 4.0.4)
#>  stringi       1.5.3   2020-09-09 [2] CRAN (R 4.0.4)
#>  stringr       1.4.0   2019-02-10 [2] CRAN (R 4.0.4)
#>  styler        1.4.1   2021-03-30 [2] CRAN (R 4.0.4)
#>  tibble        3.1.1   2021-04-18 [2] CRAN (R 4.0.4)
#>  utf8          1.2.1   2021-03-12 [2] CRAN (R 4.0.4)
#>  vctrs         0.3.7   2021-03-29 [2] CRAN (R 4.0.4)
#>  withr         2.4.2   2021-04-18 [2] CRAN (R 4.0.4)
#>  xfun          0.22    2021-03-11 [2] CRAN (R 4.0.4)
#>  xml2        * 1.3.2   2020-04-23 [2] CRAN (R 4.0.4)
#>  yaml          2.2.1   2020-02-01 [2] CRAN (R 4.0.4)
#> 

erp31 avatar Jun 02 '21 13:06 erp31

for (i in 1:500){  
  print(i)
  doc <- xml2::read_xml("<a><b>a</b></a>")
  children <- xml2::xml_children(doc)
  xml2::xml_add_parent(children, xml2::read_xml('<c>d</c>'))
}

On my machine, this one can go to 60 and a segfault is triggered.

The trigger is xml_add_parent. xml_add_child and xml_add_sibling won't trigger the segfault.

chainsawriot avatar Nov 01 '21 22:11 chainsawriot

Possible solution is adding .copy = TRUE in xml_replace() makes the function stable for iterations, but I guess this will have some impact on performance.

alexverse avatar Dec 15 '23 07:12 alexverse