rvest
rvest copied to clipboard
Handling of `rowspan` in `html_table()`
The blog post for rvest 1.0.0 provides the following example for handling of row and column spans:
html <- minimal_html("<table>
<tr><th>A</th><th>B</th><th>C</th></tr>
<tr><td colspan='2' rowspan='2'>1</td><td>2</td></tr>
<tr><td rowspan='2'>3</td></tr>
<tr><td>4</td></tr>
</table>")
html %>%
html_element("table") %>%
html_table()
#> # A tibble: 3 x 3
#> A B C
#> <int> <int> <int>
#> 1 1 1 2
#> 2 1 1 3
#> 3 4 NA 3
However, when rendered, the HTML code provided doesn't produce a table like what is shown in the result above. To replicate:
- Go to https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default
- Paste the following on the left side of the screen
<!DOCTYPE html>
<html>
<style>
table, th, td {
border: 1px solid black;
}
</style>
<body>
<table>
<tr><th>A</th><th>B</th><th>C</th></tr>
<tr><td colspan='2' rowspan='2'>1</td><td>2</td></tr>
<tr><td rowspan='2'>3</td></tr>
<tr><td>4</td></tr>
</table>
</body>
</html>
- Click Run to view the result
Note that I added some styling to draw a border around the cells to make it a bit more clear where one cell begins and another one ends.
It appears to me that rowspan
in the HTML code doesn't actually do what html_table()
assumes it does.