Formatting issue in LREC-COLING proceedings
Confirm that this is a bug report
- [X] I want to report an issue that does not concern paper or author metadata.
- [X] I have searched for similar existing issues first.
Problem Description
Partway down https://aclanthology.org/events/coling-2024/ we see a stray bold tag that messes with layout:
The cause is probably an empty <b/> tag in the preceding paper's abstract.
That’s really tricky, as it’s not disallowed by the RELAX NG schema (text can be the empty string), but also not easily enforceable in the schema (replacing text with a pattern like xsd:string {minLength="1"} results in an error in compiling the schema, most likely due to an inherent limitation of RELAX NG).
We should probably add an explicit check for these edge cases somewhere, though.
This really is tricky, as the proper test would be that the string neither does a net close nor open of a tag, while tags themselves are allowed. This is probably not achievable via a fsa and therefore relaxng; it would need to be checked on read time in python.
Edit: what I wrote needs to be checked as well, but I misread b/ as /b -- I frankly don't even understand why that breaks the rendering.
Edit: what I wrote needs to be checked as well, but I misread b/ as /b -- I frankly don't even understand why that breaks the rendering.
Closing tags are mandatory in HTML 5, I believe, except for a pre-defined list of "void" elements such as <br />. Since "b" is not a void element, I guess the browser engines "correct" the <b/> into a plain opening <b> tag.
If <b/> or <i/> etc. occur in an abstract it's probably an error, so it would be good to flag it.
Given an empty instance of an element whose content model is not EMPTY (for example, an empty title or paragraph) do not use the minimized form (e.g. use
<p> </p>and not<p />).
HTML 5 says about the structure of start tags:
Then, if the element is one of the void elements, or if the element is a foreign element, then there may be a single U+002F SOLIDUS character (/). This character has no effect on void elements, but on foreign elements it marks the start tag as self-closing.
In other words, for "normal elements" such as <b>, it does not say that the slash marks it as self-closing...
(I may have gone a bit overboard digging into this.)