xml2
xml2 copied to clipboard
[FR] How to create an S3 vector (using `{vctrs}`) whose prototype is an `xml_node` object?
I'd like to use an xml_nodeset
object as a column in a data frame.
Here are my naive attempts:
library(xml2)
library(tibble)
library(dplyr)
x <- read_xml("<parent><child>1</child><child>2<child>3</child></child><manzz>1</manzz></parent>")
xml_nodeset_obj <- xml_children(x)
xml_nodeset_obj_unclassed <- unclass(xml_nodeset_obj)
# Won't work
tibble(xml_nodeset_obj)
tbl <- tibble(child = xml_nodeset_obj_unclassed)
tbl |>
mutate(child = xml_find_all(y, xpath = './child'))
# Post tibble creation cheating won't work either :)
class(tbl$child) <- "xml_nodeset"
After reading one of the errors obtained with code above:
Error in `vec_size()`:
! `x` must be a vector, not a <xml_nodeset> object.
Run `rlang::last_error()` to see where the error occurred.
I understand now that it might be possible to make an xml_nodeset
object by following the instructions provided here: S3 vectors, right?
Should I try to implement this myself in a package of my own, or is this functionality desirable in {xml2}
?
My objective is to provide similar functionality to tidyjson but for XML data.
Would you be so kind to provide feedback on my approach here to make an S3 vector out of an xml_node
type. Bear with me as I just read S3 vectors vignette, specifically the part on list-of types.
I am indicating the prototype as structure(logical(), class = 'xml_node')
using logical()
as dummy. Not sure how to do this given that {xml2}
does not (?) provide a function to instantiate a xml_node
object.
For some reason the tibble is not showing the elements of column x
(see below).
The prefix XXX
is a placeholder for an hypothetical R package where the class XXX_xml_node
would be registered.
library(xml2)
library(tibble)
library(vctrs)
#>
#> Attaching package: 'vctrs'
#> The following object is masked from 'package:tibble':
#>
#> data_frame
new_XXX_xml_node <- function(x) {
vctrs::new_list_of(x,
ptype = structure(logical(), class = 'xml_node'),
class = "XXX_xml_node")
}
XXX_xml_node <- function(x) {
new_XXX_xml_node(x)
}
vec_ptype_full.XXX_xml_node <- function(x, ...) "XXX_xml_node"
vec_ptype_abbr.XXX_xml_node <- function(x, ...) "xml_node"
as_XXX_xml_node <- function(x, ...) UseMethod("as_XXX_xml_node")
as_XXX_xml_node.xml_node <- function(x, ...) {
XXX_xml_node(x)
}
as_XXX_xml_node.xml_nodeset <- function(x, ...) {
XXX_xml_node(unclass(x))
}
as_XXX_xml_node.xml_document <- function(x, ...) {
# Convert xml_document to a list of xml_node objects
xx <- unclass(xml2::xml_children(x))
XXX_xml_node(xx)
}
format.XXX_xml_node <- function(x, ...) {
desc <-
encodeString(vapply(x, as.character, FUN.VALUE = character(1)))
paste0(substr(desc, 1, 20 - 3), "...")
}
obj_print_data.XXX_xml_node <- function(x, ...) {
if (length(x) == 0)
return()
print(format(x), quote = FALSE)
}
# Example application
x <- read_xml("
<parent>
<child>1</child>
<child>2</child>
<child>3</child>
<child>4</child>
<child>
<grandchildren>5.1</grandchildren>
<grandchildren>5.2</grandchildren>
<grandchildren>5.3</grandchildren>
</child>
<child>
<grandchildren>6.1</grandchildren>
<grandchildren>6.2</grandchildren>
<child>6.2</child>
</child>
</parent>")
as_XXX_xml_node(x)
#> <XXX_xml_node[6]>
#> [1] <child>1</child>... <child>2</child>... <child>3</child>...
#> [4] <child>4</child>... <child>\\n <grand... <child>\\n <grand...
(tbl <- tibble::tibble(x = as_XXX_xml_node(x), i = seq_along(x)))
#> # A tibble: 6 × 2
#> x i
#> <xml_node> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
Closing in favour of #377. I don't have vctrs loaded in my brain at the moment, so I can't offer any concrete feedback on what you tried, but I think we should just do this right in the package so that you and others don't need to worry about it.