anki-morphs Add option to skip cards that have the same lemma

Add option to skip cards that have the same lemma

Open RyanMcEntire opened this issue 10 months ago • 4 comments

Describe the bug

Marking a word as known does not prevent a new inflection from appearing.

For each inflection of a root word, ankimorph treats it as a new word, and sets it as the new priority. Other inflections of the word are not suspended or marked as known.

Recalcing and changing the settings don't improve this behavior.

Steps to reproduce the behavior

use subs2srs library as morph source
use ko_core_news_sm with spacy to generate frequency list from corpus, or use collection frequency.
use ko_core_news_sm/md/lg morphemizer in note filters
choose setting for "am-unknowns field shows morph lemmas"
check suspend new cards with only known morphs
recalc
start reviews and mark morphs as known
watch the same "lemma" come up for review each time it appears as a different inflection in the sentence.

Expected behavior

I expect the morphemizer to distill a word to something like a lemma (spacy isn't capable of doing this properly with its korean models, but that may or may not be a separate issue). Id expect ankimorph to show me new words and bury variations of the same word. Just as if I were learning english, i don't need a card for walk, walking, walked, will walk, might walk, want to walk, and such for every single word.

Currently, it treats each inflection as a new word, so it behaves no different than if it were being separated by spaces.

My setup

Operating System: Windows 11
Anki Version: ⁨23.12.1 (1a1d4d54)⁩
AnkiMorphs Version: 2.1.0

Additional context

Spacy has 3 korean models, ko_core_news_sm, ko_core_news_md, and ko_core_news_lg. They all functionally work the same way

The website states that it lemmatizes korean, and this isn't technically true. The Lemma_ value returned by spacy looks like this, with the raw word on the left and "lemma" value on the right:

('준비했죠', '준비+하+었+죠')
('위해서', '위하+어서')
('먹을', '먹+ㄹ')

The lemma isn't a lemma at all, but rather a break down of each word part, and the left-most part is only the "stem", which isn't the dictionary form of the word at all. the verb for "to eat" is 먹다, not 먹. 먹 is a rare noun for an ink stick used for making writing ink.

A proper lemma value for these would look like this, placing them in their dictionary form:

('준비했죠', '준비하다')
('위해서', '위하다')
('먹을', '먹다')

to explain with a single word, this is what spacy produces:

('먹다', '먹+다')
('먹었어', '먹+었+어')
('먹는데', '먹+는+데')

lemmatized properly it would look like this, where these would all be inflections of the same word:

('먹다', '먹다')
('먹었어', '먹다')
('먹는데', '먹다')

As you can see from the frequency list generated by ankimorph, a word like 괜찮다 takes up 1034 slots on the frequency list. The value of using a morphemizer other than spaces is basically entirely lost.

Mar 31 '24 17:03 RyanMcEntire

anki-morphs anki-morphs copied to clipboard

Add option to skip cards that have the same lemma

Describe the bug

Steps to reproduce the behavior

Expected behavior

My setup

Additional context

anki-morphs
anki-morphs copied to clipboard