Build/Install issues on x86_64 Linux
Describe the bug Presidio is failing to build/install against Python 3.11 (officially supported per docs) and 3.12 on x86_64 Linux
Having tried both spaCy and Stanza as per https://microsoft.github.io/presidio/installation/ I am always encountering the following issue, seemingly a version incompatibility between numpy and something else (probably a compiled library lower down in the import graph).
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
I have replicated the same issue in a clean container using the official Python upstream image
Thank you for looking into it :pray:
To Reproduce
- Create the following Dockerfile
$ cat Dockerfile
FROM docker.io/library/python:3.11.9
RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg
- Build it
$ podman build .
STEP 1/2: FROM docker.io/library/python:3.11.9
STEP 2/2: RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg
Collecting presidio_analyzer
Downloading presidio_analyzer-2.2.354-py3-none-any.whl.metadata (2.6 kB)
Collecting spacy<4.0.0,>=3.4.4 (from presidio_analyzer)
Downloading spacy-3.7.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)
<CUT_FOR_BREVITY>
- Observe the error towards the end of the build (NOTE: the warning about running as root and the venv is noise as this is in a container)
Installing collected packages: pycryptodome, presidio_anonymizer
Successfully installed presidio_anonymizer-2.2.354 pycryptodome-3.20.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Traceback (most recent call last):
File "<frozen runpy>", line 189, in _run_module_as_main
File "<frozen runpy>", line 148, in _get_module_details
File "<frozen runpy>", line 112, in _get_module_details
File "/usr/local/lib/python3.11/site-packages/spacy/__init__.py", line 6, in <module>
from .errors import setup_default_warnings
File "/usr/local/lib/python3.11/site-packages/spacy/errors.py", line 3, in <module>
from .compat import Literal
File "/usr/local/lib/python3.11/site-packages/spacy/compat.py", line 39, in <module>
from thinc.api import Optimizer # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/thinc/api.py", line 1, in <module>
from .backends import (
File "/usr/local/lib/python3.11/site-packages/thinc/backends/__init__.py", line 17, in <module>
from .cupy_ops import CupyOps
File "/usr/local/lib/python3.11/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
from .numpy_ops import NumpyOps
File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Error: building at STEP "RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg": while running runtime: exit status 1
Note: Using Stanza instead of spaCy we are able to successfully build the container (install the libraries), but we hit the same error as soon as we try to use the library, eg:
from presidio_analyzer import AnalyzerEngine
Will trigger the same error
Expected behavior Able to install the library, import it and run the demo code (https://microsoft.github.io/presidio/getting_started/)
Screenshots N/A
Additional context Looking at the official Docker image, it seems 3.9 is being used
$ podman run --rm -it mcr.microsoft.com/presidio-analyzer bash
root@d9fed78f0a52:/usr/bin/presidio-analyzer# python -V
Python 3.9.19
Trying to build against this exact version of Python yields the same error
We began seeing this issue in the past day or two as well. Following.
Thanks for posting. Looks like an issue been spacy and numpy. Consider trying to pip install numpy as well
A new version of numpy got released two days ago. I had some luck pinning numpy==1.26.4.
Root cause: https://github.com/explosion/thinc/issues/939
https://stackoverflow.com/questions/78650222/valueerror-numpy-dtype-size-changed-may-indicate-binary-incompatibility-expec