jsoup icon indicating copy to clipboard operation
jsoup copied to clipboard

The ruby rtc element is incorrectly processed.

Open rhdunn opened this issue 6 years ago • 0 comments

Given the markup in the example from https://www.w3.org/TR/2001/REC-ruby-20010531/#complex:

<ruby>
  <rbc>
    <rb>10</rb>
    <rb>31</rb>
    <rb>2002</rb>
  </rbc>
  <rtc>
    <rt>Month</rt>
    <rt>Day</rt>
    <rt>Year</rt>
  </rtc>
  <rtc>
    <rt rbspan="3">Expiration Date</rt>
  </rtc>
</ruby>

the jsoup parser treats the rtc element as an unknown element that gets closed immediately. This causes it to serialize in xml mode as:

<rtc></rtc><rt>Month</rt><rt>Day</rt><rt>Year</rt>

I have checked the behaviour of Firefox and Chrome, and they preserve the rtc element structure, e.g.:

<rtc><rt>Month</rt><rt>Day</rt><rt>Year</rt></rtc>

The rtc element is supported in the W3C HTML spec [1], but not the WHATWG spec. Also, even though the rbc element is not listed in either of those (only in the Ruby Annotations specification), the jsoup parser preserves the rbc element structure.

[1] https://www.w3.org/TR/2014/REC-html5-20141028/text-level-semantics.html#the-rtc-element

rhdunn avatar Jan 07 '20 16:01 rhdunn