xml2rfc icon indicating copy to clipboard operation
xml2rfc copied to clipboard

Extra space in "identifiers" block HTML

Open martinthomson opened this issue 3 years ago • 3 comments

Describe the issue

The HTML rendering of the identifiers block (<dl class="identifiers">) includes a number of plain textual items, plus a few items that use nested elements. Some of the generated <dd> elements include additional whitespace before an initial, inline child element, which is hard (or maybe impossible) to remove with styling. This leads to misalignment in rendering.

Items that include this extra whitespace are:

  • <dd class="published">, which includes a <time> element as a child. (Though not <dd class="expires"> for some reason.)
  • <dd class="obsoletes"> and <dd class="updates">, which include <a> elements and text content.

Can this extra space be removed?

Code of Conduct

martinthomson avatar Aug 23 '22 07:08 martinthomson

I did some digging on this and it seems like this is going to be HARD. The lxml library manages HTML serialization and when you enable the pretty_print option (as xml2rfc does, and should do), something in the creation of the updates/obsoletes element causes lxml to serialize the content of the <dd> element on the next line:

<dd class="updates">
<a href="https://www.rfc-editor.org/rfc/rfc2119" class="eref">2119</a> (if approved)</dd>

I couldn't work out how to suppress this. It seems to be caused by there being text content in the element. A single <a> element in updates/obsoletes will render properly once you remove the line that sets a.tail = ' ', but as soon as you have two or it is a draft (where the tail is set to " (if approved)"), you have text content and lxml serializes on a new line as shown.

I did manage to suppress the leading space on the "published" element by removing the tail on the <time> element. This turns out to be added if the original <date> element from which it was created also included trailing text, which is usually just a newline. That's counter-intuitive, but a consequence of how the conversion works, so that can be tweaked:

                # Publication date
                date = x.find('date')
                date.tail = None
                pubdate = self.render_date(None, date)
                entry(dl, 'Published', pubdate)

martinthomson avatar Aug 23 '22 23:08 martinthomson

I now see id="identifiers", which clashes with document IDs:

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/291

cabo avatar Nov 03 '22 20:11 cabo

That id="identifiers" thing seems pretty serious and might be worth a different issue.

(On this issue, I've a workaround for this in styling. It is an abomination, but it does work well enough, assuming that you have CSS grid and flexbox and a few other things that shouldn't be necessary but end up being essential.)

martinthomson avatar Nov 03 '22 20:11 martinthomson