frog
frog copied to clipboard
FoLiA nodes with 'mixed' structure
Consider this example:
<?xml version='1.0' encoding='utf-8'?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" version="1.5.1" xml:id="page" generator="pynlpl.formats.folia-v1.5.1.88">
<metadata type="native">
<annotations>
<token-annotation annotator="ucto" annotatortype="auto" datetime="2017-10-01T17:33:00" set="tokconfig-nld"/>
</annotations>
<meta id="language">nld</meta>
</metadata>
<text xml:id="text">
<s xml:id="s.1"><t>test twee</t></s>
<p xml:id="p1">
<w xml:id="w.1">
<t>test</t>
</w>
<w xml:id="w.2">
<t>aha</t>
</w>
<s xml:id="s.2">
<t>Een brief voor de koning.</t>
</s>
</p>
</text>
</FoLiA>
At the moment Frog will ignore the two words in the paragraph and only handle the sentence within. This is questionable. But if we do want to handle those 2 loose words, what is desired then? Should we create a sentence out of them? or leave them separated? This also involves Ucto, as that is used to create the sentences. (but not for the new Frog implementation we are working on)