Is this folia valid? And how to handle it...
Given this FoliA example:
<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="apart" generator="libfolia-v0.11" version="1.5">
<metadata type="native">
<annotations>
</annotations>
</metadata>
<text xml:id="text">
<div xml:id="text.div.1">
<head xml:id="head">
<t>Kop</t>
</head>
<p xml:id="text.div.1.p.1">
<t>OK?</t>
</p>
<s xml:id="text.div.1.s.1">
<t>Inleiding</t>
<w xml:id="text.div.1.s.1.w.1">
<t>Inleiding</t>
</w>
</s>
</div>
</text>
</FoLiA>
it is accepted by the validator and also folialint.
There is maybe an issue here with the <div> having both a Sentence AND a Paragraph.
This is valid FoLiA, but maybe it is against the 'gut feeling' that in this case the Sentence should be embedded in a Paragraph. Should this feeling be formalized? (an how)
And if not, this has ramifications for a lot of FoLiA based software, liike Ucto, Frog, TICCL etc., that assumes OR sentences OR paragraphs with sentences.
Yes, it is valid indeed. I agree it would be nicer when it's homogenous but I don't think we can/should enforce that in FoLiA itself, if tools like ucto/frog pose such extra constraints then that's fair enough I'd say.
Hmmm. imposing such implicit semantics might hit us hard in the future. But making homogeneity a strict prerequisite is maybe hard to implement. (can a DTD express this at all?) Maybe it is good to document this as 'good behavior' for FoLiA users. Or have the validator point it out? I will consider adding warnings (or even fatal errors) to the tools in my control.
see also issue #42