morphodict icon indicating copy to clipboard operation
morphodict copied to clipboard

Modified edit-distance matching missing some edits or has suboptimal weighting algorithm

Open aarppe opened this issue 5 years ago • 2 comments

The following is a selected set of test strings for which the weighted edit matches sometimes work and sometimes not [+: expected behavior; -: unexpected behavior; *: fixed - UPDATED on 11.1.2023]:

+ mitsiw -> + mîciw
* micow -> * mîciw
* mitsow -> * mîciw
+ neeyu -> + niya, + niyâ
* neeyuh -> * niya, * niyâ
+ nigi-nidawi-wapamaw -> + nikî-nitawi-wâpamâw
* nigi-nidawi-wabamaw -> * nikî-nitawi-wâpamâw
* nibaw -> * nipâw
* nohte -> * nôhtê-
* nohte- -> * nôhtê-
+ mitâs -> + mitâs
* nitâs -> + nitâs, * mitâs [not linked to lemma `mitâs`] 
* nitas -> - mitâs, * nitâs [neither result]
* mitas -> * mitâs

Also, in principle, the edit-weighting should be such that the following ranking should result:

ewapamat -> ê-wâpamat < ê-wâpamât (1 edit less)

aarppe avatar Jan 18 '20 04:01 aarppe

Here's most of the above being run through and recognized by a descriptive FST with a weighted spell-relax:

hfst-lookup -q ../../inc/crk-anl-desc-w.hfst 
nitas
nitas	mitâs+N+A+D+Px1Sg+Sg	0.250000
nitas	mitâs+N+I+D+Px1Sg+Sg	0.250000
nitas	nitâs+N+A+D+Px1Sg+Sg	0.250000
nitas	nitâs+N+I+D+Px1Sg+Sg	0.250000

mitas
mitas	mitâs+N+A+D+PxX+Sg	0.250000
mitas	mitâs+N+I+D+PxX+Sg	0.250000
mitas	mihtâtêw+V+TA+Imp+Imm+2Sg+3SgO	0.750000

mitsiw
mitsiw	mîciw+V+TI+Ind+Prs+3Sg	0.750000

micow
micow	mîciw+V+TI+Ind+Prs+3Sg	0.750000

mitsow
mitsow	mîciw+V+TI+Ind+Prs+3Sg	1.250000

neeyu
neeyu	niya+Pron+Pers+1Sg	0.000000
neeyu	niyâ+Ipc	0.000000

neeyuh
neeyuh	niya+Pron+Pers+1Sg	0.000000
neeyuh	niyâ+Ipc	0.000000

nigi-nitawi-wapamaw
nigi-nitawi-wapamaw	PV/nitawi+wâpamêw+V+TA+Ind+Prt+1Sg+3SgO	0.750000

mitâs
mitâs	mitâs+N+A+D+PxX+Sg	0.000000
mitâs	mitâs+N+I+D+PxX+Sg	0.000000
mitâs	mihtâtêw+V+TA+Imp+Imm+2Sg+3SgO	0.500000

nitâs
nitâs	mitâs+N+A+D+Px1Sg+Sg	0.000000
nitâs	mitâs+N+I+D+Px1Sg+Sg	0.000000
nitâs	nitâs+N+A+D+Px1Sg+Sg	0.000000
nitâs	nitâs+N+I+D+Px1Sg+Sg	0.000000

nitas
nitas	mitâs+N+A+D+Px1Sg+Sg	0.250000
nitas	mitâs+N+I+D+Px1Sg+Sg	0.250000
nitas	nitâs+N+A+D+Px1Sg+Sg	0.250000
nitas	nitâs+N+I+D+Px1Sg+Sg	0.250000

mitas
mitas	mitâs+N+A+D+PxX+Sg	0.250000
mitas	mitâs+N+I+D+PxX+Sg	0.250000
mitas	mihtâtêw+V+TA+Imp+Imm+2Sg+3SgO	0.750000

aarppe avatar Jan 19 '20 02:01 aarppe

The above errors appear to have gotten resolved.

aarppe avatar Jan 11 '23 23:01 aarppe