zemberek-nlp icon indicating copy to clipboard operation
zemberek-nlp copied to clipboard

Zemberek creates duplicate WordAnalysis results

Open ssaltin opened this issue 8 years ago • 4 comments

For the input "yoksa" Zemberek generates 6 WordAnalysis, which contains 2 duplicate results, bold ones are duplicate

0 = {WordAnalysis@6912} "[(yoksa:yoksa) (Conj)]" 1 = {WordAnalysis@6913} "[(yok:yok) (Adj)(Verb;Cond:sa+A3sg)]" 2 = {WordAnalysis@6914} "[(yok:yok) (Adj)(Verb;Cond:sa+A3sg)]" 3 = {WordAnalysis@6915} "[(yok:yok) (Noun;A3sg+Pnon+Nom)(Verb;Cond:sa+A3sg)]" 4 = {WordAnalysis@6916} "[(yok:yok) (Adj)(Noun;A3sg+Pnon+Nom)(Verb;Cond:sa+A3sg)]" 5 = {WordAnalysis@6917} "[(yok:yok) (Adj)(Noun;A3sg+Pnon+Nom)(Verb;Cond:sa+A3sg)]"

image

ssaltin avatar Dec 14 '16 13:12 ssaltin

As I realized now, their dictionary items are different:

yok [P:Adj; A:NoVoicing] yok [P:Adj; A:Voicing]

But still aren't they include same morphological result?

ssaltin avatar Dec 14 '16 13:12 ssaltin

Thanks, I am aware of this problem and should be fixed in next version hopefully.

ahmetaa avatar Dec 15 '16 11:12 ahmetaa

0.12 still creates double results. did not test with 0.13

Input: yoksa yoksa [yoksa:Conj] yoksa:Conj [yoksamak:Verb] yoksa:Verb+Imp+A2sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [Yok:Noun,Prop] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg [yok:Noun] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg Disambiguation result: [yoksa:Conj] yoksa:Conj

mdakin avatar May 24 '18 06:05 mdakin

0.13.0 also produces double results for this. Because voicing attribute is optional for "yok" when constructing graph, two stem transitions are created for "yok". And for inputs like "yoktan" or "yok" paths passing from both stem transitions successfully terminates.

One possible solution for those words, reference attribute can be used. For example:

yok [P:Adj; A:Voicing] yok [P:Adj; A:NoVoicing, Ref:yok_Adj] ---> pointing first one

And after analysis, if morphemes are equal and both referenced item and item exists, one can be deleted. This can be done as a post processing operation.

ahmetaa avatar May 24 '18 07:05 ahmetaa