MorphMan
MorphMan copied to clipboard
Add initial support for SpaCy and SudachiPy parsers.
So I am nearly finished on a separate anki plugin that will install and manage spacy and its models for you. You won't need to have a python installed on your computer. Additionally, I have made changes to morphman that will listen to events from this spacy plugin and will register morpemizers for all installed language models.
That's great, I won't land this then. What's your timeline for the new plugin?
Most of what is left is clean-up, UI styling and testing. Realistically, unless my paying job gets in the way, I believe another week. Here is the UI currently.
@rteabeault how's it going with the SpaCy installer?
@ianki I thought I was almost done, but I just ran into a significant problem with the package installation. My goal was to allow people to install all needed packages without having a separate python installed. I thought I had this working but what I didn't realize is that pip is (in some cases) forking a subprocess and calling python directly on setup.py
. My own installed python was masking this problem. I am not sure how I am going to be able to fix this which really blows as this was coming together nicely. I will continue to try and figure out how to fix this but we may want to proceed with your solution for now. My apologies.
@rteabeault interesting. Did you have that pip code in a repo?
@ianki The code is in a private repo but I will put the pip code here:
import json
import runpy
import sys
from contextlib import redirect_stdout, redirect_stderr
from io import TextIOBase
from PyQt5.QtCore import QObject, pyqtSignal
from pip._internal.cli import progress_bars
from pip._internal.cli.progress_bars import DownloadProgressMixin
from pip._vendor.progress.bar import Bar
from pkg_resources import Requirement
class PipInstallerSignals(QObject):
install_complete = pyqtSignal(object)
install_progress = pyqtSignal(str)
install_failed = pyqtSignal(object, object)
class PipInstaller:
def __init__(self, requirement, target_path, install_deps=True):
super(PipInstaller, self).__init__()
self.requirement = requirement
self.target_path = target_path
self.install_deps = install_deps
self.signals = PipInstallerSignals()
self.output = PipInstallProgress(self)
def run(self):
original_argv = sys.argv
try:
sys.argv = [
'pip',
'install',
'--upgrade',
'--no-cache-dir',
'--progress-bar', 'qt_friendly',
'--disable-pip-version-check',
'--no-cache-dir',
'-t', self.target_path,
self.target()
]
if not self.install_deps:
sys.argv.append('--no-deps')
with redirect_stdout(self.output), redirect_stderr(self.output):
runpy.run_module("pip", run_name="__main__")
except SystemExit as se:
if se.code == 0:
self.signals.install_complete.emit(self)
else:
self.signals.install_failed.emit(se, self)
except Exception as e:
self.signals.install_failed.emit(e, self)
finally:
sys.argv = original_argv
def target(self):
if type(self.requirement) == str:
return self.requirement
elif type(self.requirement) == Requirement:
return self.requirement.url if self.requirement.url else str(self.requirement)
Because pip is calling python in a subprocess and because anki does not ship with a "real" python this will not work as is. Now my addon does two things.
- It allows users to install spacy and its models from a UI. It handles compatibility of models with the installed version of spacy. It indicates to the user when a package has updates available. And it provides information about the model you have selected.
- It sends hooks that other anki addons can subscribe to to get information about spacy in the current environment. That is how my integration with Morphman works here: https://github.com/rteabeault/MorphMan/commit/450413e45465dd5883af1f3acdc1646cdb5776af When models are installed or removed the proper hooks are sent and Morphman reacts by adding/removing the morphemizers from its registry. This also keeps the combo box for the morphemizers in sync. I feel the registry is a nice change for morphman even without the spacy addon as it could be used to register and manage other morphemizers such as a sudachi.
I could force the user to install python on their system to use the addon. This is unfortunate in my opinion as it creates a complexity with the addon that I had preferred not pass onto the user. At this point I am not sure of how else to get around this. Another option is to ditch the UI altogether and just keep the hook passing part. Make the user's just pip install spacy and its models. When anki starts up it could look in the user's python site-packages and send the appropriate hooks to Morphman for what is installed. That would still be a decent experience I think.
@rteabeault thanks, I see the issue.
I was able to get Spacy's modules installed after changing these subprocess calls to exec().
.\pip_internal\operations\build\metadata_legacy.py
with build_env:
#call_subprocess(
# args,
# cwd=source_dir,
# command_desc='python setup.py egg_info',
#)
print('call_subprocess', args, source_dir)
prev_sys_argv = sys.argv.copy()
prev_cwd = os.getcwd()
prev_name = __name__
try:
cpos = args.index('-c')
theargs = args[cpos+1]
sys.argv = args[cpos+1:].copy()
sys.argv[0] = '-c'
os.chdir(source_dir)
print('set sys.argv to', sys.argv)
print('set cwd to', os.getcwd())
print('running eval on: %s' % theargs)
globals()['__name__'] = '__main__'
exec(theargs, globals(), globals())
finally:
globals()['__name__'] = prev_name
sys.argv = prev_sys_argv
os.chdir(prev_cwd)
.\pip_internal\operations\install\legacy.py
with indent_log(), build_env:
#runner(
# cmd=install_args,
# cwd=unpacked_source_directory,
#)
print('runner', install_args, unpacked_source_directory)
prev_sys_argv = sys.argv.copy()
prev_cwd = os.getcwd()
prev_name = __name__
try:
cpos = install_args.index('-c')
theargs = install_args[cpos+1]
sys.argv = install_args[cpos+1:].copy()
sys.argv[0] = '-c'
os.chdir(unpacked_source_directory)
print('set sys.argv to', sys.argv)
print('set cwd to', os.getcwd())
print('running eval on: %s' % theargs)
globals()['__name__'] = '__main__'
exec(theargs, globals(), globals())
finally:
globals()['__name__'] = prev_name
sys.argv = prev_sys_argv
os.chdir(prev_cwd)
I've actually gotten the modules to install without an external Python. Are you interested in experimenting further?
@rteabeault here's the modified pip.zip.
I definitely am interested in experimenting further. I will take a look at this tomorrow. Thanks so much for looking at this!
Cool. I have been testing this and it seems to work well, but we should test on multiple platforms.
A few questions on my mind -
- Is there a particular reason to have the installation UI as a separate add-on instead of built into MorphMan?
- Should we support more than SpaCy? I could see other python NLP modules also being useful to add in a similar way.
Hi @ianki. Your patch seems to be working for me on OS X. What OS were you testing on?
Once I realized that we couldn't package Morphman with spacy I decided to create the package manager. I thought installing spacy and its models and then adding the spacy package to the sys path may be useful to other addon developers so I decided to make it a separate addon. Now with that said I could be convinced to make it part of Morphman but I wanted to give it maximum usability. And in its current state we could expand what I have to handle other NLP packages. What do you think? In the meantime I am going to continue "finishing" this addon. We can decide to roll it into Morphman later and/or add support other packages.
I've tested this on Windows and Mac and it seems to be working with the 'spacy' models. No objection on my side to finishing the add-on.
I also tried it with 'fugashi', and though the package gets installed it ends up missing the necessary mecab.dll, but I haven't debugged it further.