MorphMan icon indicating copy to clipboard operation
MorphMan copied to clipboard

Add initial support for SpaCy and SudachiPy parsers.

Open ianki opened this issue 4 years ago • 13 comments

ianki avatar Nov 10 '20 06:11 ianki

So I am nearly finished on a separate anki plugin that will install and manage spacy and its models for you. You won't need to have a python installed on your computer. Additionally, I have made changes to morphman that will listen to events from this spacy plugin and will register morpemizers for all installed language models.

rteabeault avatar Nov 10 '20 19:11 rteabeault

That's great, I won't land this then. What's your timeline for the new plugin?

ianki avatar Nov 10 '20 19:11 ianki

Most of what is left is clean-up, UI styling and testing. Realistically, unless my paying job gets in the way, I believe another week. Here is the UI currently.

image

rteabeault avatar Nov 10 '20 21:11 rteabeault

@rteabeault how's it going with the SpaCy installer?

ianki avatar Nov 27 '20 16:11 ianki

@ianki I thought I was almost done, but I just ran into a significant problem with the package installation. My goal was to allow people to install all needed packages without having a separate python installed. I thought I had this working but what I didn't realize is that pip is (in some cases) forking a subprocess and calling python directly on setup.py. My own installed python was masking this problem. I am not sure how I am going to be able to fix this which really blows as this was coming together nicely. I will continue to try and figure out how to fix this but we may want to proceed with your solution for now. My apologies.

rteabeault avatar Nov 28 '20 08:11 rteabeault

@rteabeault interesting. Did you have that pip code in a repo?

ianki avatar Nov 28 '20 20:11 ianki

@ianki The code is in a private repo but I will put the pip code here:


import json
import runpy
import sys
from contextlib import redirect_stdout, redirect_stderr
from io import TextIOBase

from PyQt5.QtCore import QObject, pyqtSignal
from pip._internal.cli import progress_bars
from pip._internal.cli.progress_bars import DownloadProgressMixin
from pip._vendor.progress.bar import Bar
from pkg_resources import Requirement


class PipInstallerSignals(QObject):
  install_complete = pyqtSignal(object)
  install_progress = pyqtSignal(str)
  install_failed = pyqtSignal(object, object)

class PipInstaller:
  def __init__(self, requirement, target_path, install_deps=True):
    super(PipInstaller, self).__init__()
    self.requirement = requirement
    self.target_path = target_path
    self.install_deps = install_deps
    self.signals = PipInstallerSignals()
    self.output = PipInstallProgress(self)

  def run(self):
    original_argv = sys.argv
    try:
      sys.argv = [
        'pip',
        'install',
        '--upgrade',
        '--no-cache-dir',
        '--progress-bar', 'qt_friendly',
        '--disable-pip-version-check',
        '--no-cache-dir',
        '-t', self.target_path,
        self.target()
      ]

      if not self.install_deps:
        sys.argv.append('--no-deps')

      with redirect_stdout(self.output), redirect_stderr(self.output):
        runpy.run_module("pip", run_name="__main__")
    except SystemExit as se:
      if se.code == 0:
        self.signals.install_complete.emit(self)
      else:
        self.signals.install_failed.emit(se, self)
    except Exception as e:
      self.signals.install_failed.emit(e, self)
    finally:
      sys.argv = original_argv

  def target(self):
    if type(self.requirement) == str:
      return self.requirement
    elif type(self.requirement) == Requirement:
      return self.requirement.url if self.requirement.url else str(self.requirement)

Because pip is calling python in a subprocess and because anki does not ship with a "real" python this will not work as is. Now my addon does two things.

  1. It allows users to install spacy and its models from a UI. It handles compatibility of models with the installed version of spacy. It indicates to the user when a package has updates available. And it provides information about the model you have selected.

image

  1. It sends hooks that other anki addons can subscribe to to get information about spacy in the current environment. That is how my integration with Morphman works here: https://github.com/rteabeault/MorphMan/commit/450413e45465dd5883af1f3acdc1646cdb5776af When models are installed or removed the proper hooks are sent and Morphman reacts by adding/removing the morphemizers from its registry. This also keeps the combo box for the morphemizers in sync. I feel the registry is a nice change for morphman even without the spacy addon as it could be used to register and manage other morphemizers such as a sudachi.

I could force the user to install python on their system to use the addon. This is unfortunate in my opinion as it creates a complexity with the addon that I had preferred not pass onto the user. At this point I am not sure of how else to get around this. Another option is to ditch the UI altogether and just keep the hook passing part. Make the user's just pip install spacy and its models. When anki starts up it could look in the user's python site-packages and send the appropriate hooks to Morphman for what is installed. That would still be a decent experience I think.

rteabeault avatar Dec 04 '20 00:12 rteabeault

@rteabeault thanks, I see the issue.

I was able to get Spacy's modules installed after changing these subprocess calls to exec().

.\pip_internal\operations\build\metadata_legacy.py

with build_env:
        #call_subprocess(
        #    args,
        #    cwd=source_dir,
        #    command_desc='python setup.py egg_info',
        #)

        print('call_subprocess', args, source_dir)

        prev_sys_argv = sys.argv.copy()
        prev_cwd = os.getcwd()
        prev_name = __name__
        try:
            cpos = args.index('-c')
            theargs = args[cpos+1]
            sys.argv = args[cpos+1:].copy()
            sys.argv[0] = '-c'
            os.chdir(source_dir)

            print('set sys.argv to', sys.argv)
            print('set cwd to', os.getcwd())
            print('running eval on: %s' % theargs)

            globals()['__name__'] = '__main__'
            exec(theargs, globals(), globals())
        finally:
            globals()['__name__'] = prev_name
            sys.argv = prev_sys_argv
            os.chdir(prev_cwd)

.\pip_internal\operations\install\legacy.py

with indent_log(), build_env:
                #runner(
                #    cmd=install_args,
                #    cwd=unpacked_source_directory,
                #)

                print('runner', install_args, unpacked_source_directory)

                prev_sys_argv = sys.argv.copy()
                prev_cwd = os.getcwd()
                prev_name = __name__
                try:
                    cpos = install_args.index('-c')
                    theargs = install_args[cpos+1]
                    sys.argv = install_args[cpos+1:].copy()
                    sys.argv[0] = '-c'
                    os.chdir(unpacked_source_directory)

                    print('set sys.argv to', sys.argv)
                    print('set cwd to', os.getcwd())
                    print('running eval on: %s' % theargs)

                    globals()['__name__'] = '__main__'
                    exec(theargs, globals(), globals())
                finally:
                    globals()['__name__'] = prev_name
                    sys.argv = prev_sys_argv
                    os.chdir(prev_cwd)

I've actually gotten the modules to install without an external Python. Are you interested in experimenting further?

ianki avatar Dec 06 '20 01:12 ianki

@rteabeault here's the modified pip.zip.

ianki avatar Dec 06 '20 03:12 ianki

I definitely am interested in experimenting further. I will take a look at this tomorrow. Thanks so much for looking at this!

rteabeault avatar Dec 06 '20 03:12 rteabeault

Cool. I have been testing this and it seems to work well, but we should test on multiple platforms.

A few questions on my mind -

  • Is there a particular reason to have the installation UI as a separate add-on instead of built into MorphMan?
  • Should we support more than SpaCy? I could see other python NLP modules also being useful to add in a similar way.

ianki avatar Dec 06 '20 09:12 ianki

Hi @ianki. Your patch seems to be working for me on OS X. What OS were you testing on?

Once I realized that we couldn't package Morphman with spacy I decided to create the package manager. I thought installing spacy and its models and then adding the spacy package to the sys path may be useful to other addon developers so I decided to make it a separate addon. Now with that said I could be convinced to make it part of Morphman but I wanted to give it maximum usability. And in its current state we could expand what I have to handle other NLP packages. What do you think? In the meantime I am going to continue "finishing" this addon. We can decide to roll it into Morphman later and/or add support other packages.

rteabeault avatar Dec 07 '20 00:12 rteabeault

I've tested this on Windows and Mac and it seems to be working with the 'spacy' models. No objection on my side to finishing the add-on.

I also tried it with 'fugashi', and though the package gets installed it ends up missing the necessary mecab.dll, but I haven't debugged it further.

ianki avatar Dec 08 '20 04:12 ianki