Docker support for Stanza NLP Engine
I'm trying to start the analyzer api with the stanza engine.
conf/default.yaml:
nlp_engine_name: stanza
models:
-
lang_code: en
model_name: en
Output:
root@27ed8ca2f545:/usr/bin/presidio-analyzer# pipenv run python app.py --host 0.0.0.0
2023-02-09 20:41:38,211 - presidio-analyzer - INFO - Starting analyzer engine
2023-02-09 20:41:38,214 - presidio-analyzer - INFO - nlp_engine not provided, creating default.
Traceback (most recent call last):
File "/usr/bin/presidio-analyzer/app.py", line 130, in <module>
server = Server()
File "/usr/bin/presidio-analyzer/app.py", line 40, in __init__
self.engine = AnalyzerEngine()
File "/usr/bin/presidio-analyzer/presidio_analyzer/analyzer_engine.py", line 58, in __init__
nlp_engine = provider.create_engine()
File "/usr/bin/presidio-analyzer/presidio_analyzer/nlp_engine/nlp_engine_provider.py", line 81, in create_engine
raise ValueError(
ValueError: NLP engine 'stanza' is not available. Make sure you have all required packages installed
I've tried manually installing stanza, both on a container level, and within an activated virtual environment:
# install for container
pipenv install stanza
# install for virtual env
source .venv/bin/activate
pipenv install stanza
But I get the same error. What do I miss here?
Originally posted by @edeak in https://github.com/microsoft/presidio/discussions/1027
Hi @edeak, this is not yet supported, but should be a simple extension. If you look at the Docker.transformers file, adapting it to stanza should be relatively easy:
FROM python:3.9-slim
ARG NAME
ARG NLP_CONF_FILE=conf/stanza.yaml
ENV PIPENV_VENV_IN_PROJECT=1
ENV PIP_NO_CACHE_DIR=1
WORKDIR /usr/bin/${NAME}
COPY ./Pipfile* /usr/bin/${NAME}/
RUN pip install pipenv \
&& pipenv sync
RUN pipenv install spacy-stanza --skip-lock
# install nlp models specified in conf/default.yaml
COPY ./install_nlp_models.py /usr/bin/${NAME}/
COPY ${NLP_CONF_FILE} /usr/bin/${NAME}/${NLP_CONF_FILE}
RUN pipenv run python install_nlp_models.py --conf_file ${NLP_CONF_FILE}
COPY . /usr/bin/${NAME}/
EXPOSE ${PORT}
CMD pipenv run python app.py --host 0.0.0.0
I haven't tested this, so if you give it a try please reply if it worked or didn't, and a PR would be absolutely a fantastic addition.
Hi @omri374
I was able to start my container with Stanza, but it's a bit beyond the Dockerfile editing. As I understand, the appropriate NLPEngine has to be passed to the app otherwise the default Spacy engine would be used. So in app.py line 41, I do
self.engine = AnalyzerEngine(nlp_engine={ENGINE})
where the engine could be StanzaNlpEngine for stanza and TransformersNlpEngine for transformers.
will play with it and maybe I can put a PR together with the ability to inject the right NLPEngine based on the passed conf
Yes, the app.py has to be adapted to this too. This should be straightforward if you inject the conf file as you mentioned.