dkpro-jwktl icon indicating copy to clipboard operation
dkpro-jwktl copied to clipboard

Add inflection group property to the word form

Open highsource opened this issue 6 years ago • 2 comments

As discussed in #57, add the int inflectionGroup property to IWiktionaryWordForm.

The purpose of this property is to identify which "column" of the inflection table does a word form belong to.

Consider the word Fels for example:

{{Deutsch Substantiv Übersicht
|Genus 1=m
|Genus 2=m
|Nominativ Singular 1=Fels
|Nominativ Singular 2=Fels
|Nominativ Plural=Felsen
|Genitiv Singular 1=Fels
|Genitiv Singular 1*=Felses
|Genitiv Singular 1**=Felsens
|Genitiv Singular 2=Felsen
|Genitiv Plural=Felsen
|Dativ Singular 1=Fels
|Dativ Singular 2=Felsen
|Dativ Plural=Felsen
|Akkusativ Singular 1=Fels
|Akkusativ Singular 2=Felsen
|Akkusativ Plural=Felsen
}}

image

Here we would generate the following word forms:

we would thus generate 14 word forms with the following properties:

  • form=Fels, gender=MASC, num=SING, case=NOM, inflectionGroup=1
  • form=Fels, gender=MASC, num=SING, case=NOM, inflectionGroup=2
  • form=Fels, gender=null, num=PL, case=NOM, inflectionGroup=0
  • form=Fels, gender=MASC, num=SING, case=GEN, inflectionGroup=1
  • form=Felses, gender=MASC, num=SING, case=GEN, inflectionGroup=1
  • form=Felsens, gender=MASC, num=SING, case=GEN, inflectionGroup=1
  • form=Felsen, gender=MASC, num=SING, case=GEN, inflectionGroup=2
  • form=Felsen, gender=null, num=PL, case=GEN, inflectionGroup=0
  • form=Fels, gender=MASC, num=SING, case=DAT, inflectionGroup=1
  • form=Felsen, gender=MASC, num=SING, case=DAT, inflectionGroup=2
  • form=Felsen, gender=null, num=PL, case=DAT, inflectionGroup=0
  • form=Fels, gender=MASC, num=SING, case=ACC, inflectionGroup=1
  • form=Felsen, gender=MASC, num=SING, case=ACC, inflectionGroup=2
  • form=Felsen, gender=null, num=PL, case=ACC, inflectionGroup=0

highsource avatar Aug 12 '18 15:08 highsource

I will work on this in my form and send the PR later on.

highsource avatar Aug 12 '18 15:08 highsource

@chmeyer Once again, there's a surprise. Some entries contain several declination tables (I've seen two so far). For instance: https://de.wiktionary.org/wiki/Apfelschorle

This means we can't simply use X from Nominativ Singular X as index. Not so simple.

My suggestion is to add 4 to index of every further table. For the Apfelschorle example we'd have inflection group 1 and then 5. 4 is chosen as there should be max.4 inflection groups per table per number.

An alternative would be to count inflection groups through all the tables. For the Apfelschorle example we'd have inflection group 1 and then 2 then.

What do you think?

highsource avatar Aug 26 '18 18:08 highsource