xml2
xml2 copied to clipboard
Empty document if passing HTML comments
I've bene using html_nodes
in @tidyverse rvest
to try to extract elements of extracted HTML, but I realized if I accidentally put in a commented string, it gives back a proper html_document
, but then xml_find_all
has an issue.
@carriewright11
string = "<!-- <img src=\"https://docs.google.com/\"/> -->"
doc = xml2::read_html(string)
doc
#> {html_document}
class(doc)
#> [1] "xml_document"
rvest::html_nodes(doc, xpath = "//img")
#> Error in UseMethod("xml_find_all"): no applicable method for 'xml_find_all' applied to an object of class "xml_document"
xml2::xml_find_all(doc, xpath = "//img")
#> Error in UseMethod("xml_find_all"): no applicable method for 'xml_find_all' applied to an object of class "xml_document"
Created on 2021-06-22 by the reprex package (v2.0.0)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.5 (2021-03-31)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2021-06-22
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.2)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.0)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [2] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#> httr 1.4.2 2020-07-20 [2] CRAN (R 4.0.2)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
#> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.0)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.2)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.2)
#> rvest 1.0.0 2021-03-09 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.0)
#> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2)
#> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.0.2)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
#> vctrs 0.3.7 2021-03-29 [1] CRAN (R 4.0.2)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2)
#> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.2)
#> xml2 1.3.2 2020-04-23 [2] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.0)
#>
#> [1] /Users/johnmuschelli/Library/R/4.0/library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library