jsoup
jsoup copied to clipboard
The ruby rtc element is incorrectly processed.
Given the markup in the example from https://www.w3.org/TR/2001/REC-ruby-20010531/#complex:
<ruby>
<rbc>
<rb>10</rb>
<rb>31</rb>
<rb>2002</rb>
</rbc>
<rtc>
<rt>Month</rt>
<rt>Day</rt>
<rt>Year</rt>
</rtc>
<rtc>
<rt rbspan="3">Expiration Date</rt>
</rtc>
</ruby>
the jsoup parser treats the rtc element as an unknown element that gets closed immediately. This causes it to serialize in xml mode as:
<rtc></rtc><rt>Month</rt><rt>Day</rt><rt>Year</rt>
I have checked the behaviour of Firefox and Chrome, and they preserve the rtc element structure, e.g.:
<rtc><rt>Month</rt><rt>Day</rt><rt>Year</rt></rtc>
The rtc element is supported in the W3C HTML spec [1], but not the WHATWG spec. Also, even though the rbc element is not listed in either of those (only in the Ruby Annotations specification), the jsoup parser preserves the rbc element structure.
[1] https://www.w3.org/TR/2014/REC-html5-20141028/text-level-semantics.html#the-rtc-element