xml2
xml2 copied to clipboard
xml_add_parent produces a segfault in for loop
By developing a plumber API with xml2 I fall into the following error under a small stress test. I reproduce a minimal example in my local machine.
The following code chunk produces an error,
library(xml2)
xx <- function() {
x <- read_xml("<fruits><apple color='red'></apple></fruits>")
xml_add_parent(x, read_xml("<food></food>"))
print(as.character(x))
}
for(i in 1:1000)xx()
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n <fruits>\n <apple color=\"red\"/>\n </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n <fruits>\n <apple color=\"red\"/>\n </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n <fruits>\n <apple color=\"red\"/>\n </fruits>\n</food>\n"
*** caught segfault ***
address 0x55ff44000000, cause 'memory not mapped'
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=el_GR.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=el_GR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=el_GR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=el_GR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xml2_1.3.2 plumber_1.1.0.9000
loaded via a namespace (and not attached):
[1] compiler_4.0.4 magrittr_2.0.1 R6_2.5.0 later_1.2.0
[5] promises_1.2.0.1 tools_4.0.4 swagger_3.33.1 Rcpp_1.0.6
[9] stringi_1.6.1 jsonlite_1.7.2 webutils_1.1 lifecycle_1.0.0
[13] rlang_0.4.11
you can bypass it for example by using xml_add_child and xml_replace instead.
Hi, I'm also experiencing the problem of R crashing when xml_add_parent
is used in combination with other code. As a minimal example it crashes when the code below is run three times. When I originally found the problem I was only calling xml_add_parent
once in a script with many other function calls. However, I don't know how to create a minimal example for that I'm afraid.
library(xml2)
# Create XML document
doc <- read_xml("<parent><child1>Hello</child1></parent>")
# Check current elements
children <- xml_children(doc)
new_node <- read_xml('<new_node>New text</new_node>')
xml_add_parent(children, new_node)
# Show that the parent node has been added
doc
#> {xml_document}
#> <parent>
#> [1] <new_node>New text<child1>Hello</child1></new_node>
If I run the above in a loop then it causes R to crash e.g.:
library(xml2)
for (i in 1:3){
# Create XML document
doc <- read_xml("<parent><child1>Hello</child1></parent>")
# Check current elements
children <- xml_children(doc)
#expect_equal(xml_text(children), c("Hello"))
new_node <- read_xml('<new_node>New text</new_node>')
xml_add_parent(children, new_node)
doc
}
reprex produces this:
This reprex appears to crash R. See standard output and standard error for more details.
Standard output and error
*** caught segfault ***
address 0x5610c8000000, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...
OR this:
This reprex appears to crash R. See standard output and standard error for more details.
Standard output and error
free(): invalid pointer
Thanks to AleKoure for pointing out the workaround and helping me locate which part of my code was crashing my R session.
Created on 2021-06-02 by the reprex package (v2.0.0)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.4 (2021-02-15)
#> os CentOS Linux 8
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz UTC
#> date 2021-06-02
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> backports 1.2.1 2020-12-09 [2] CRAN (R 4.0.4)
#> cli 2.4.0 2021-04-05 [2] CRAN (R 4.0.4)
#> crayon 1.4.1 2021-02-08 [2] CRAN (R 4.0.4)
#> digest 0.6.27 2020-10-24 [2] CRAN (R 4.0.4)
#> ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.4)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.4)
#> fansi 0.4.2 2021-01-15 [2] CRAN (R 4.0.4)
#> fs 1.5.0 2020-07-31 [2] CRAN (R 4.0.4)
#> glue 1.4.2 2020-08-27 [2] CRAN (R 4.0.4)
#> highr 0.9 2021-04-16 [2] CRAN (R 4.0.4)
#> htmltools 0.5.1.1 2021-01-22 [2] CRAN (R 4.0.4)
#> knitr 1.32 2021-04-14 [2] CRAN (R 4.0.4)
#> lifecycle 1.0.0 2021-02-15 [2] CRAN (R 4.0.4)
#> magrittr 2.0.1 2020-11-17 [2] CRAN (R 4.0.4)
#> pillar 1.6.0 2021-04-13 [2] CRAN (R 4.0.4)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.4)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.4)
#> reprex 2.0.0 2021-04-02 [2] CRAN (R 4.0.4)
#> rlang 0.4.10 2020-12-30 [2] CRAN (R 4.0.4)
#> rmarkdown 2.7 2021-02-19 [2] CRAN (R 4.0.4)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.4)
#> stringi 1.5.3 2020-09-09 [2] CRAN (R 4.0.4)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.4)
#> styler 1.4.1 2021-03-30 [2] CRAN (R 4.0.4)
#> tibble 3.1.1 2021-04-18 [2] CRAN (R 4.0.4)
#> utf8 1.2.1 2021-03-12 [2] CRAN (R 4.0.4)
#> vctrs 0.3.7 2021-03-29 [2] CRAN (R 4.0.4)
#> withr 2.4.2 2021-04-18 [2] CRAN (R 4.0.4)
#> xfun 0.22 2021-03-11 [2] CRAN (R 4.0.4)
#> xml2 * 1.3.2 2020-04-23 [2] CRAN (R 4.0.4)
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.4)
#>
for (i in 1:500){
print(i)
doc <- xml2::read_xml("<a><b>a</b></a>")
children <- xml2::xml_children(doc)
xml2::xml_add_parent(children, xml2::read_xml('<c>d</c>'))
}
On my machine, this one can go to 60 and a segfault is triggered.
The trigger is xml_add_parent
. xml_add_child
and xml_add_sibling
won't trigger the segfault.
Possible solution is adding .copy = TRUE
in xml_replace() makes the function stable for iterations, but I guess this will have some impact on performance.