Schema: NER restriction
Current schema allows this situation:
<name type="LOC">
<kinesic type="applause">
<desc>Oklaski</desc>
</kinesic>
</name>
https://github.com/clarin-eric/ParlaMint/blob/92ba447bf720cf48d038ec3044257534332f18a7/Schema/ParlaMint-TEI.ana.rng#L112-L121
The schema should be restricted in this way:
- every named entity should contain oneOrMore named entities or words.
- And zeroOrMore comments
Related issue: #84
I agree that should be restricted, but
- Did you actually find such cases in the corpora? At least for the example that you gave, as far as I see, it doesn't exists in the PL corpus. I would be surprised if it did exist, as incidents were exceluded from annotation, so the system would in fact be annotating an empty string as NER
- It will make the content model more complicated, in fact I'm not really sure how to impletement such a restriction, would have to study RelaxNG first.
Not saying I won't do it, just maybe not straight away.
Did you actually find such cases in the corpora
No, I have built it based on the wrongly understood example from #84
IIt will make the content model more complicated, in fact I'm not really sure how to impletement such a restriction, would have to study RelaxNG first.
I don't know either. (CZ NER already made schema quite complicated...)
Not saying I won't do it, just maybe not straight away.
Ok, let's keep this issue for the next releases
This is obviously "future"....