Stylesheets icon indicating copy to clipboard operation
Stylesheets copied to clipboard

`<empty>`, when child of `<sequence>` or `<alternate>`, not correctly processed

Open sydb opened this issue 2 months ago • 3 comments

See #263; despite being closed, I do not think the problem identified by @lb42 has been solved.

The <empty> element, as a member of model.contentPart, is permitted only as a child of <content>, <sequence>, or <alternate> (and, after TEIC/TEI#2538 is merged, <interleave>).

content/empty

I do not believe there is any controversy when <empty> is a child of <content> — this has been exercised by both the Guidelines and lots of people’s ODDs dozens or hundreds of times. Since <content> is required to have 1 and only 1 child, there is no sibling rivalry between <empty> and its siblings, as it has none.

alternate/empty

But when <empty> is a child of <alternate>, its sibling seems to just beat it to death:

 <alternate minOccurs="1" maxOccurs="1">
   <empty/>
   <elementRef key="add" minOccurs="1" maxOccurs="1"/>
 </alternate>

should produce an optional <add> — either ( empty | add ) or ( add )? or ( add? ) or perhaps even ( add | empty ). But what it actually produces is just ( add ), i.e. a required <add>, not an optional <add>. (Yes, I realize the correct effect can be obtained by using the much simpler <elementRef key="add" minOccurs="0" maxOccurs="1"/>, but that’s not the point.)

sequence/empty

I have discovered at least one circumstance for which incorrect output is generated when <empty> is a child of <sequence>. Consider the following PureODD construction. While admittedly a bit off the beaten track, the intent is for a content model that allows either 0 <docDate> elements or 2 or more <docDate> elements — i.e., any number of <docDate>s except one; furthermore, if there are any <docDate>s there can also be global stuff with them.

 <content>
   <sequence minOccurs="0" maxOccurs="1">
     <empty/>
     <sequence minOccurs="2" maxOccurs="unbounded">
       <elementRef key="docDate" minOccurs="1" maxOccurs="1"/>
       <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/>
     </sequence>
   </sequence>
 </content>

However, the RELAX NG produced by these Stylesheets seems to completely lose the outer <sequence minOccurs="0" maxOccurs="1"> clause. I.e., the <empty> seems to commit not only suicide (which was expected), but parricide as well:

 (
   docDate, model.global*,
   docDate, model.global*,
   ( docDate, model.global* )*
 )

If the outer <sequence> is changed to an <alternate>, the correct model is generated.

It is possible these two problems are related, although I think it unlikely. (So it may be more convenient to split this into two issues.)

I plan to post an ODD that demonstrates these situations shortly.

sydb avatar Oct 05 '25 01:10 sydb

@sydb Should we expect processing of <empty/> to be different in a sequence of elements (rather than attributes)? An empty element might hold 20 attributes, but isn't it impossible to have a condition of <empty/> and other elements that come after emptiness in <content>?

What processing behavior / Relax NG output should be expected when <empty/> is set in a <sequence> of element content models? If the intention is to permit emptiness or elements, isn't it incorrect to be encoding that with <sequence>?

I might be missing something really important here, and if so, I apologize. This question is coming from me teaching a Relax NG unit and considering that when encoding content models in Relax NG (at least in compact syntax), one is setting labels in the content model that are defined elsewhere, so that we can do this:

someElement = element myElement {empty, r, n }

r = attribute ref {xsd:IDREFS}
n = attribute num {xsd:integer}

and represent this:

<myElement ref="id1 id2" num="5"/>

But in ODD, since we use <classes>, <attList>, and <attDef> and don't define attributes in <content>, does it ever make sense to allow <empty/> in a <sequence> within <content>? If it doesn't make sense, should ODD processing throw an outright error on attempted conversion to RelaxNG?

ebeshero avatar Oct 07 '25 11:10 ebeshero

Sorry I took so long to get to this, @ebeshero. While at first glance it does not make a lot of sense to include <tei:empty> in a <tei:sequence>, it is allowed, in part to be in parallel with <rng:empty> which is allowed as a child of <rng:group> (and of <rng:choice>).

But at second glance there is at least one reason to have it there: it can be useful when processing ODDs. If I have

<content>
  <sequence>
    <ref key="duck"/>
    <ref key="quack"/>
  </sequence>
</content>

but the element <quack> does not exist in this schema (either because there is an <elementSpec mode="delete"> for it, or it was on the @except of the <moduleRef> that normally would have included it) an ODD processor has to do something. The two most obvious are “just delete the <ref key=quack>” and “replace the <ref key=quack> with <empty>”. If I do the former, I end up with an invalid <sequence> element, since it requires 2+ children.

You might point out that since <rng:group> does not require 2+ children, we could drop that constraint. But it does require 1+ children, so if <duck> were missing, too, we either need to replace the <ref>s with <empty> or do complicated heuristics to recognize when the <group> has no children and remove it (and maybe its parent once it is removed, etc.).

sydb avatar Nov 25 '25 14:11 sydb