api icon indicating copy to clipboard operation
api copied to clipboard

Clarity on "well-formed XML" in Presentation API

Open scossu opened this issue 2 years ago • 3 comments

In https://iiif.io/api/presentation/3.0/#45-html-markup-in-property-values :

The content must be well-formed XML and therefore must be wrapped in an element such as p or span.

And, further down,

Clients should allow only a, b, br, i, img, p, small, span, sub and sup tags.

The language is the same in 2.x.

I have doubts on the interpretation of the first quote, using the second one as a context, especially in the case where the content consists of multiple paragraph blocks. Shall I interpret it as a) "the whole content must be wrapped in a single element", or b) "there shall be no dangling tags and loose text outside of tags"?

Since there is no recommended block-level tag type that can wrap p tags, I lean toward the second one. To me, <p>Paragraph 1</p><p>Paragrph 2</p> is valid XML (just not a complete XML document, which would need other elements anyways). However that is not quite clear and this interpretation has been the subject of discussion recently. Any clarification in the text would help.

scossu avatar Apr 06 '22 18:04 scossu

Good point @scossu. I think we intended both that XML snippet be valid (as your two paragraph example is) and also that it would be wrapped in a single element (not the case for your example). To me this is implied by:

MUST be wrapped in an [ie. a single] element such as p or span.

I think the looseness is the "therefore" because "well-formed XML" [XML what?] is not quite specific enough.

zimeon avatar Apr 08 '22 15:04 zimeon

If I understand correctly, then multiple paragraphs are not allowed, or at least not recommended, since the only HTML element that can wrap them would be a block-level one such as div or section, which is not listed in the "recommended" tags further down. Right?

scossu avatar Apr 08 '22 17:04 scossu

Good point about the block level elements not being listed.

The intent as far as I recall was: a single element that can be passed to an XML parser for manipulation (e.g. could be in the browser which will accept garbage, but could be outside a browser which might throw an error unless it was a well formed document). Then the list of tags was chosen to emphasize presentation and to limit choice, and thus reduce complexity for testing compliance.

Seems like we should add <div> in 3.1

azaroth42 avatar May 17 '22 18:05 azaroth42