rvest icon indicating copy to clipboard operation
rvest copied to clipboard

Handling of `rowspan` in `html_table()`

Open mine-cetinkaya-rundel opened this issue 3 years ago • 0 comments

The blog post for rvest 1.0.0 provides the following example for handling of row and column spans:

html <- minimal_html("<table>
  <tr><th>A</th><th>B</th><th>C</th></tr>
  <tr><td colspan='2' rowspan='2'>1</td><td>2</td></tr>
  <tr><td rowspan='2'>3</td></tr>
  <tr><td>4</td></tr>
</table>")

html %>%
  html_element("table") %>%
  html_table()
#> # A tibble: 3 x 3
#>       A     B     C
#>   <int> <int> <int>
#> 1     1     1     2
#> 2     1     1     3
#> 3     4    NA     3

However, when rendered, the HTML code provided doesn't produce a table like what is shown in the result above. To replicate:

  • Go to https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default
  • Paste the following on the left side of the screen
<!DOCTYPE html>
<html>
<style>
table, th, td {
  border: 1px solid black;
}
</style>
<body>

<table>
  <tr><th>A</th><th>B</th><th>C</th></tr>
  <tr><td colspan='2' rowspan='2'>1</td><td>2</td></tr>
  <tr><td rowspan='2'>3</td></tr>
  <tr><td>4</td></tr>
</table>

</body>
</html>
  • Click Run to view the result

Note that I added some styling to draw a border around the cells to make it a bit more clear where one cell begins and another one ends.

It appears to me that rowspan in the HTML code doesn't actually do what html_table() assumes it does.

mine-cetinkaya-rundel avatar Aug 26 '21 18:08 mine-cetinkaya-rundel