dkpro-jwktl
dkpro-jwktl copied to clipboard
Add inflection group property to the word form
As discussed in #57, add the int
inflectionGroup
property to IWiktionaryWordForm
.
The purpose of this property is to identify which "column" of the inflection table does a word form belong to.
Consider the word Fels for example:
{{Deutsch Substantiv Übersicht
|Genus 1=m
|Genus 2=m
|Nominativ Singular 1=Fels
|Nominativ Singular 2=Fels
|Nominativ Plural=Felsen
|Genitiv Singular 1=Fels
|Genitiv Singular 1*=Felses
|Genitiv Singular 1**=Felsens
|Genitiv Singular 2=Felsen
|Genitiv Plural=Felsen
|Dativ Singular 1=Fels
|Dativ Singular 2=Felsen
|Dativ Plural=Felsen
|Akkusativ Singular 1=Fels
|Akkusativ Singular 2=Felsen
|Akkusativ Plural=Felsen
}}
Here we would generate the following word forms:
we would thus generate 14 word forms with the following properties:
- form=Fels, gender=MASC, num=SING, case=NOM, inflectionGroup=1
- form=Fels, gender=MASC, num=SING, case=NOM, inflectionGroup=2
- form=Fels, gender=null, num=PL, case=NOM, inflectionGroup=0
- form=Fels, gender=MASC, num=SING, case=GEN, inflectionGroup=1
- form=Felses, gender=MASC, num=SING, case=GEN, inflectionGroup=1
- form=Felsens, gender=MASC, num=SING, case=GEN, inflectionGroup=1
- form=Felsen, gender=MASC, num=SING, case=GEN, inflectionGroup=2
- form=Felsen, gender=null, num=PL, case=GEN, inflectionGroup=0
- form=Fels, gender=MASC, num=SING, case=DAT, inflectionGroup=1
- form=Felsen, gender=MASC, num=SING, case=DAT, inflectionGroup=2
- form=Felsen, gender=null, num=PL, case=DAT, inflectionGroup=0
- form=Fels, gender=MASC, num=SING, case=ACC, inflectionGroup=1
- form=Felsen, gender=MASC, num=SING, case=ACC, inflectionGroup=2
- form=Felsen, gender=null, num=PL, case=ACC, inflectionGroup=0
I will work on this in my form and send the PR later on.
@chmeyer Once again, there's a surprise. Some entries contain several declination tables (I've seen two so far). For instance: https://de.wiktionary.org/wiki/Apfelschorle
This means we can't simply use X
from Nominativ Singular X
as index. Not so simple.
My suggestion is to add 4
to index of every further table. For the Apfelschorle
example we'd have inflection group 1
and then 5
.
4
is chosen as there should be max.4
inflection groups per table per number.
An alternative would be to count inflection groups through all the tables. For the Apfelschorle
example we'd have inflection group 1
and then 2
then.
What do you think?