DEMorphy icon indicating copy to clipboard operation
DEMorphy copied to clipboard

publish on pypi

Open lsmith77 opened this issue 1 year ago • 8 comments

it would be awesome to get the project registered on https://pypi.org/

lsmith77 avatar Oct 19 '22 10:10 lsmith77

Thanks for your comment! I made the library quite some time ago, I don't remember why I skipped registering to pypi. Though project is Python3 compatible, still I want to do some revisions. I do it when I have time, after that I can register the new package.

DuyguA avatar Oct 19 '22 10:10 DuyguA

that would be amazing. I was planning to try out this project. We are currently using https://github.com/gambolputty/german-nouns but are hoping to find a single library that can handle nouns, verbs and adjectives for German.

lsmith77 avatar Oct 19 '22 10:10 lsmith77

FYI our use case is our inclusive writing assistant https://www.witty.works/ and we are looking for ways to make our alternatives grammatically correct.

so we will need to align the word(s) we detected as problematic with the alternatives.

f.e. ambitionierten => engagierten

lsmith77 avatar Oct 19 '22 10:10 lsmith77

FYI our use case is our inclusive writing assistant https://www.witty.works/ and we are looking for ways to make our alternatives grammatically correct.

so we will need to align the word(s) we detected as problematic with the alternatives.

f.e. ambitionierten => engagierten

Ah OK, got it so you need to match the morphological features as well. OK then, I can update you from here when I'm finished.

DuyguA avatar Oct 19 '22 18:10 DuyguA

exactly. thank you so much for your work

lsmith77 avatar Oct 19 '22 20:10 lsmith77

another wrinkle is sicherzustellen which has the spacy lemma sicherstellen.

so if we have an alternative umsetzen we need to transform this to umzustellen or an alternative bewirken needs to be come zu bewirken.

not sure if compound word splitting is within the scope here.

lsmith77 avatar Jan 16 '23 21:01 lsmith77

another wrinkle is sicherzustellen which has the spacy lemma sicherstellen.

so if we have an alternative umsetzen we need to transform this to umzustellen or an alternative bewirken needs to be come zu bewirken.

not sure if compound word splitting is within the scope here.

No, compound splitting not in the scope indeed. However, the case of sichzustellen should be fairly easy. The lemma is not a substring of the surface form, and there's a zu in between. If you split the surface form from zu and unite the pieces it becomes the lemma soooo you can divide this word as sicher + zu + stellen .

Actually you can use my German corpus to generate a small model. I believe there are many zu , um and be prefixed words in the corpus, you can show those words to (Phonetisaurus)[https://github.com/AdolfVonKleist/Phonetisaurus] . Phonetisaurus is a g2p originally, it can align sequences. So, you train a efficient seq2seq as input sequence are words as chars, and output words as surface forms you want to create. I have a community day on 27th Jan, if you want I can schedule a small consultation to offer some solutions (or better make a tool for compound analysis, I wanted to develop one for German for some time)

DuyguA avatar Jan 17 '23 22:01 DuyguA

Thank you.

As noted it is not too hard to detect that the source word sicherzustellen and with the spacy lemma sicherstellen has a zu injected.

The hard part is then taken an alternative like bewirken, hochstellen and umstellen and then know where the place the zu to align the form, i.e. zu bewirken, hochzustellen and umzustellen. Now be as a prefix is regular (always prepend zu ) but um is irregular. Also, hochstellen is the form adjective + verb but for that one first has to split the words to be able to determine if it is the given case.

lsmith77 avatar Jan 18 '23 07:01 lsmith77