xml2 icon indicating copy to clipboard operation
xml2 copied to clipboard

Empty document if passing HTML comments

Open muschellij2 opened this issue 3 years ago • 0 comments

I've bene using html_nodes in @tidyverse rvest to try to extract elements of extracted HTML, but I realized if I accidentally put in a commented string, it gives back a proper html_document, but then xml_find_all has an issue.

@carriewright11

string = "<!-- <img src=\"https://docs.google.com/\"/> -->"
doc = xml2::read_html(string)
doc
#> {html_document}
class(doc)
#> [1] "xml_document"
rvest::html_nodes(doc, xpath = "//img")
#> Error in UseMethod("xml_find_all"): no applicable method for 'xml_find_all' applied to an object of class "xml_document"
xml2::xml_find_all(doc, xpath = "//img")
#> Error in UseMethod("xml_find_all"): no applicable method for 'xml_find_all' applied to an object of class "xml_document"

Created on 2021-06-22 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.5 (2021-03-31)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2021-06-22                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.2)
#>  cli           2.5.0   2021-04-26 [1] CRAN (R 4.0.2)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.2)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.2)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.0.2)
#>  evaluate      0.14    2019-05-28 [2] CRAN (R 4.0.0)
#>  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.2)
#>  fs            1.5.0   2020-07-31 [2] CRAN (R 4.0.2)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.0.2)
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#>  httr          1.4.2   2020-07-20 [2] CRAN (R 4.0.2)
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.0.2)
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.2)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.2)
#>  pillar        1.6.0   2021-04-13 [1] CRAN (R 4.0.2)
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.0.0)
#>  purrr         0.3.4   2020-04-17 [2] CRAN (R 4.0.0)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.0.2)
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.0.2)
#>  rmarkdown     2.7     2021-02-19 [1] CRAN (R 4.0.2)
#>  rvest         1.0.0   2021-03-09 [1] CRAN (R 4.0.2)
#>  sessioninfo   1.1.1   2018-11-05 [2] CRAN (R 4.0.0)
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [2] CRAN (R 4.0.0)
#>  styler        1.4.1   2021-03-30 [1] CRAN (R 4.0.2)
#>  tibble        3.1.1   2021-04-18 [1] CRAN (R 4.0.2)
#>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.2)
#>  vctrs         0.3.7   2021-03-29 [1] CRAN (R 4.0.2)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.0.2)
#>  xfun          0.22    2021-03-11 [1] CRAN (R 4.0.2)
#>  xml2          1.3.2   2020-04-23 [2] CRAN (R 4.0.0)
#>  yaml          2.2.1   2020-02-01 [2] CRAN (R 4.0.0)
#> 
#> [1] /Users/johnmuschelli/Library/R/4.0/library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

muschellij2 avatar Jun 22 '21 15:06 muschellij2