MorphMan
MorphMan copied to clipboard
Words in external.db file aren't being recognized and morphman and still giving me 1T cards with those in the external.db file.
Title. The k-value updates but thats about it.
What language is this happening for? I've used the external.db feature for Japanese and it works fine, but perhaps there are issues elsewhere.
Chinese.
Shan Rauf [email protected]於 2019年12月25日 週三,下午1:36寫道:
What language is this happening for? I've used the external.db feature for Japanese and it works fine, but perhaps there are issues elsewhere.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaegi/MorphMan/issues/51?email_source=notifications&email_token=ANX4UZIJRG7NZPEAJXPPF63Q2NHUBA5CNFSM4JM4LXL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHUKKMI#issuecomment-568894769, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANX4UZIIQPQ7HTFXE7CSCP3Q2NHUBANCNFSM4JM4LXLQ .
I see, I will look into this once I get a solid test suite working for the add-on (it'll make testing these things a lot simpler)
Cool, appreciate it!
Shan Rauf [email protected]於 2019年12月26日 週四,下午1:53寫道:
I see, I will look into this once I get a solid test suite working for the add-on (it'll make testing these things a lot simpler)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaegi/MorphMan/issues/51?email_source=notifications&email_token=ANX4UZPXC2CFHOV4JDTOI5DQ2SSK7A5CNFSM4JM4LXL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVRDLI#issuecomment-569053613, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANX4UZNGS7ATWII2S7HNQYLQ2SSK7ANCNFSM4JM4LXLQ .
Hey @ollieeeq, I just tested the external.db feature with Chinese and it worked fine for me. It's possible that you selected the wrong morphemizer when generating the external.db. (e.x when I want to generate an external.db from a text file for example, I go into the database manager, change "Language w/ Spaces" to "Chinese," then extract morphemes from that text file into a db which I rename "external.db," and I make sure that external.db file is in my dbs folder. After that, recalcing should work.
Check out the section of Matt vs Japan's post on Morphman that talks about the database manager: https://massimmersionapproach.com/table-of-contents/anki/morphman/#database_manager
If my suggestions above don't work for you, could you provide me a way to reproduce the issues you're having?
I have a similar issue. For me external.db isn't outright not recognized, just not recognized consistently. Morphman recognizes some of the words in my new cards that appear in external.db as already known and doesn't recognize some others. I tried to circumvent the issue by just deleting external.db and importing all the words in it into an Anki deck which I tagged as already known, but it still persists. Could this be some sort of character encoding issue? I get this with Japanese sentence cards by the way.
I'll give an example for the strange stuff that is happening: I have a test card in which the field morphman scans has the following content (all words that I have tagged as already known in some other cards.) : そう ママ 中 ざわめき ボリス 集まり ほら 学 回 I run a recalc I get that the unknown words are: 回, 学, 中
Now I make another test card with the scanned field containing: 学 回 and tag it as already known, after a recalc morphman says the unknown words in the first card are: 学, 中 Than if I change the second test card to 回 学 and run a recalc again, morphman says the only unknown word in the first test card is 中.
Ok, I think I understand the problem now, let me show it on another example: I have a sentence card extracted from subtitles that has the following in the field scanned by Morphman: (ペコ) 強い の? そいつ Morphman says the single unknown morph in this sentence is 強い, but I have 強い tagged as already known in another card. The difference shows if I click on view morphemes to see how Morphman parses the scanned field. In the card that is tagged as already known I get the following for morpheme for 強い: 0 強い 強い 強い ツヨイ 形容 詞 一般 I'm not entirely sure what all of the fields represent, but everything seems to be in order here. It is parsed as an adjective that is pronounced as "tsuyoi". However, In the sentence card from above, I get: 0 強い 強い 強い シイ 名詞 普通名詞 The kanji still matches, but this time it is parsed as a noun that is pronounced "shii", which is complete nonsesnse. My examples in the previous post are also due to this type of parsing error.
So can I do anything to deal with this type of error? I get it quite often in my sentence cards, especially in single kanji words. There are also a lot of words that get decomposed in very unexpected ways. I have the MeCab UniDic addon installed, so Morphman is supposed to use that for parsing.
Hi I'm still having this issue, did you find a solution for this ?
I don't really have a complete solution, but I've found that deleting mecab and relying on the older and supposedly worse parser from the Japanese addon gives me better results, still not perfect though. Also, I tried to teach morphman with single word cards, and that just doesn't work too well in general, especially for single kanji words. The parser probably makes use of the context of the words. If you keep tagging actual sentences as already known morphman is going to get it eventually. I've been using it since I wrote my last reply, and about 3/4 of the sentences it gives me are useful now.
Oh, and if you make changes in your external.db I suggest you delete your entire morphman database and regenerate it from scratch instead of just a normal recalc. It gave me some funny results before I did that.