xml2
xml2 copied to clipboard
The GoT challenge
Transferring from Slack convo.
The GoT data in repurrrsive comes from an API but for teaching purposes, after I wrangle it into the list that ships with the package, I transform it back into JSON and XML files that also ship with the package.
The current script below data-raw/
uses an experimental branch of xml2 I created before xml2 had a nice function for this. I implemented as_xml()
and the moral equivalent in xml2 today is as_xml_document()
. But I can't just drop it in there.
https://github.com/jennybc/repurrrsive/blob/7ecc79f9a6b1925996ec20d78f0291d91451db31/data-raw/got_pov-chars-wrangle.R#L5-L7
https://github.com/jennybc/repurrrsive/blob/7ecc79f9a6b1925996ec20d78f0291d91451db31/data-raw/got_pov-chars-wrangle.R#L116-L120
> got_chars %>%
+ #xml2:::as_xml() %>%
+ as_xml_document() %>%
+ as.character() %>% ## workaround to get explicit UTF-8 encoding
+ read_xml() %>%
+ write_xml(here("inst", "extdata", "got_chars.xml"))
Error: Root nodes must be of length 1
I haven't looked at this lately but I think there's either a buglet in as_xml_document()
or a UI/documentation gap. It would be nice if I could reproduce the XML already in repurrrsive with current xml2 functions. This is a little detective adventure for someone.
Progress on this issue could take many different forms, including:
- Reproduce the problem for yourself and construct a better, smaller reprex and diagnosis. What's different about the GoT example and the working examples and tests for
as_xml_document()
? - Identify the usage problem in the above code and suggest docs or examples to help people avoid same mistake
- Find an actual bug
- Fix that bug
@jennybc, I'm confused about the labels. Is this issue related to a specific event or anyone can have a go at it anytime?
@Bisaloo you are welcome to look into it anytime if you like!
But if it is not addressed by the next tidy-dev-day we wanted to mark it as one people could try then.
I've created a reprex for this issue using the example from the demo of as_xml() @jennybc did back then.
library(xml2)
y_list <- list(
setNames(letters[1:4], letters[1:4]),
two = list(
setNames(1:4 * 1.1, month.abb[1:4]),
hey = c(TRUE, FALSE)
),
three = NULL
)
attr(y_list$two, "attrib1") <- "hi"
attr(y_list$two, "attrib2") <- "hey"
y_xml <- as_xml_document(y_list)
#> Error: Root nodes must be of length 1
Created on 2019-07-08 by the reprex package (v0.3.0)
I've created a reprex for this issue using the example from the demo of as_xml() @jennybc did back then.
library(xml2) y_list <- list( setNames(letters[1:4], letters[1:4]), two = list( setNames(1:4 * 1.1, month.abb[1:4]), hey = c(TRUE, FALSE) ), three = NULL ) attr(y_list$two, "attrib1") <- "hi" attr(y_list$two, "attrib2") <- "hey" y_xml <- as_xml_document(y_list) #> Error: Root nodes must be of length 1
Created on 2019-07-08 by the reprex package (v0.3.0)
A way around the issue above is by enclosing y_list
into a list
but then it hits another error
library(xml2)
y_list <- list(
setNames(letters[1:4], letters[1:4]),
two = list(
setNames(1:4 * 1.1, month.abb[1:4]),
hey = c(TRUE, FALSE)
),
three = NULL
)
attr(y_list$two, "attrib1") <- "hi"
attr(y_list$two, "attrib2") <- "hey"
y_xml <- as_xml_document(list(y_list = y_list))
#> Error in node_append_content(x$node, value): Expecting a single string value: [type=character; extent=4].
Created on 2019-07-08 by the reprex package (v0.3.0)
After much digging, it turns out as_xml_document()
cannot handle the case where the list has content in it while as_xml()
does.
This is consistent with the documentation of the function:
This turns an R list into the equivalent XML document. Not all R lists will produce valid XML, in particular there can only be one root node and all child nodes need to be named (or empty) lists. R attributes become XML attributes and R names become XML node names.
library(xml2)
y_list_empty <- list(
one = list(),
two = list(),
three = list()
)
attr(y_list_empty$two, "attrib1") <- "hi"
attr(y_list_empty$two, "attrib2") <- "hey"
y_xml_empty <- as_xml_document(list(y_list_empty = y_list_empty))
y_xml_empty
#> {xml_document}
#> <y_list_empty>
#> [1] <one/>
#> [2] <two attrib1="hi" attrib2="hey"/>
#> [3] <three/>
y_list_full <- list(
one = list(setNames(letters[1:4], letters[1:4])),
two = list(),
three = list()
)
attr(y_list_full$two, "attrib1") <- "hi"
attr(y_list_full$two, "attrib2") <- "hey"
y_xml_full <- as_xml_document(list(y_list_full = y_list_full))
#> Error in node_append_content(x$node, value): Expecting a single string value: [type=character; extent=4].
Created on 2019-07-08 by the reprex package (v0.3.0)
That very last case can be handled by as_xml()
with the result below:
> as_xml(y_list_full)
{xml_document}
<root>
[1] <one>\n <elem>\n <a>a</a>\n <b>b</b>\n <c>c</c>\n <d>d</d>\n </elem>\n</one>
[2] <two attrib1="hi" attrib2="hey"/>
[3] <three/>
In my opinion, it needs to be discussed whether or not as_xml_document()
is supposed to handle this scenario or not. If yes, then its capabilities need to be expanded to match the capabilities of as_xml()
from @jennybc or if populating the xml file with elements is the job of another function.
Thanks @R3myG for pushing this investigation along.