MorphMan
MorphMan copied to clipboard
Add support for spaCy.
Fixes #162
- Previous attempt was to create an AnkiSpacy addon that was a package manager for installing spacy and its models and notifying other addons. This posed several issues (mainly in windows). See https://github.com/rteabeault/AnkiSpacy/issues/7
- This solution is essentially a continuation of the @ianki solution here https://github.com/kaegi/MorphMan/pull/193
- It uses a python executable path to execute spaCy to discover what models are installed.
- It uses the same python path to run a subprocess that listens on stdin and uses spaCy to parse the passed text.
- An unfortunate side affect of this is there is currently not a good way to kill the subprocess after a recalc. This may not be an issue in practice but in the future it may be good to have a open/close for morphemizers. These could be used to initialize the subprocess and then close it after.
- Created a MorphemizerRegistry that contains all registered morphemizers. Adding and removing morphemizers fires events. The MorphemizerComboBox listens to these events to keep itself updated.
- A
fake_aqt
was being added to sys.modules for tests but this was done in all_tests.py. This meant you could not run the tests individually. Moved all modifications of sys.modules into fake_aqt and added some additional sys.modules needed for new tests.
@ianki, @thinkingbox12 New pull request for spaCy support. Tests are passing locally for me but failing the automated build. I will look into that shortly but I wanted you both to take a look at this.
Sorry for the delayed reply. It seems to me that this new version is not communicating with the spacy addon. When going into morphman to change the preferences for recalc, it could not see the installed models. I tried this on a fresh profile as well.
Let me know if there's anything else besides that specifically you want checked.
@thinkingbox12 did you read the instructions in the readme? This does not use the spacy addon that I wrote.
Nope. My fault. Will read the readme and try again tomorrow.
I have installed ubuntu 18.04 and python 3.7 and am still unable to reproduce this test failure. I will continue to investigate.
Will try ubuntu today. Got caught up with other things, sorry about the delay.
Well, everything seems to be working fine for me. Ubuntu 20.04.1 LTS 64bit etc... using Python 3.8.5 Could download the model properly and link properly in terminal. Not sure if this has anything to do with it at all, but I kept the old SpaCy package manager in the profile. Don't think it makes a difference though because obviously, python couldn't see my prior installed models through the Spacy Package manager. Could recalc properly with a few Japanese notes, morph count updated. Reading known.db also made sense to me. TLDR everything good on my end. Sorry again for the delay.
Tests fixed. @ianki Can you please take a look? Thanks!
Any updates on merging this into the default MorphMan version?
I'm getting this exception after a new Install to windows after a while. @rteabeault any guesses?
Error
An error occurred. Please start Anki while holding down the shift key, which will temporarily disable the add-ons you have installed.
If the issue only occurs when add-ons are enabled, please use the Tools > Add-ons menu item to disable some add-ons and restart Anki, repeating until you discover the add-on that is causing the problem.
When you've discovered the add-on that is causing the problem, please report the issue on the add-on support site.
Debug info:
Anki 2.1.35 (84dcaa86) Python 3.8.0 Qt 5.14.2 PyQt 5.14.2
Platform: Windows 10
Flags: frz=True ao=True sv=1
Add-ons, last update check: 2021-03-18 23:12:43
Caught exception:
Traceback (most recent call last):
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\__init__.py", line 17, in onMorphManRecalc
main.main()
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 573, in main
allDb = mkAllDb(cur)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 195, in mkAllDb
ms = getMorphemes(morphemizer, fieldValue, ts)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemes.py", line 166, in getMorphemes
ms = morphemizer.getMorphemesFromExpr(expression)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemizer.py", line 51, in getMorphemesFromExpr
morphs = self._getMorphemesFromExpr(expression)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\deps\spacy\morphemizer.py", line 40, in _getMorphemesFromExpr
self.proc.stdin.flush()
OSError: [Errno 22] Invalid argument
Another exception, either getting this one or the last one. tried reinstalling spacy and models many times, with no luck. Is SpaCy still in interest of being developed? I've been looking into some cool Japanese features in the meantime.
Caught exception:
Traceback (most recent call last):
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\__init__.py", line 17, in onMorphManRecalc
main.main()
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 573, in main
allDb = mkAllDb(cur)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\main.py", line 195, in mkAllDb
ms = getMorphemes(morphemizer, fieldValue, ts)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemes.py", line 166, in getMorphemes
ms = morphemizer.getMorphemesFromExpr(expression)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\morphemizer.py", line 51, in getMorphemesFromExpr
morphs = self._getMorphemesFromExpr(expression)
File "C:\Users\AppData\Roaming\Anki2\addons21\MorphMan\morph\deps\spacy\morphemizer.py", line 41, in _getMorphemesFromExpr
morphs = json.loads(self.proc.stdout.readline())
File "json\__init__.py", line 357, in loads
File "json\decoder.py", line 337, in decode
File "json\decoder.py", line 355, in raw_decode
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
@rteabeault The problem above results from an encoding problem to terminal (on the Japanese model, specifically) in Windows. My instinct tells me this behavior has resulted from a new Windows feature update- the terminal effectively is displaying Japanese characters as the Unicode 'unknown character' glyph, so when they get passed through to SudachiPy, it fails, and an exception results.
I am suspecting that changing the region and locale to Japan so that the terminal supports UTF-8 and Japanese glyphs might solve the problem, but this has not been tested yet, and is probably an ineffective solution for most users of this addon.
The most recent version of this repo works just fine on Ubuntu.
I am interested in development for Spacy 3.0, which might simplify the link process, as it was revamped and considered obsolete. AFAIK some of the syntax is changed slightly, and doesn't work currently.
EDIT
Oddly enough though, on Ubuntu, when upgrading from sudachipy 0.4.5 (which worked) to 0.4.9, I got the same exception that I did on Windows. Upgrading once again on Ubuntu to 0.5.2 resolved the issue. Is this coincidental?
Just wondering, if Spacey provides better analysis than MeCab then perhaps it would be better as a new add-on? I almost see Morphman as an add-on for Japanese and not other languages (that all came later).
There's a lot in this repo and maybe by replacing MeCab and only using Spacey lots of code could be removed and the codebase simplified?
Hey guys, sorry for the long wait on this. What's the current state of this support? Should I look to merge this?
I was able to rebase this, and it seems to work OK after fixing handling of new lines in the expressions.
Hey all...What do I have to do to merge this into my morphman installation?
Just wondering, if Spacey provides better analysis than MeCab then perhaps it would be better as a new add-on? I almost see Morphman as an add-on for Japanese and not other languages (that all came later).
There's a lot in this repo and maybe by replacing MeCab and only using Spacey lots of code could be removed and the codebase simplified?
This is not true. People use morphman for other languages and there is no reason why they shouldn't benefit for Spacy