presidio icon indicating copy to clipboard operation
presidio copied to clipboard

Build/Install issues on x86_64 Linux

Open dushankw opened this issue 1 year ago • 5 comments

Describe the bug Presidio is failing to build/install against Python 3.11 (officially supported per docs) and 3.12 on x86_64 Linux

Having tried both spaCy and Stanza as per https://microsoft.github.io/presidio/installation/ I am always encountering the following issue, seemingly a version incompatibility between numpy and something else (probably a compiled library lower down in the import graph).

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I have replicated the same issue in a clean container using the official Python upstream image

Thank you for looking into it :pray:

To Reproduce

  1. Create the following Dockerfile
$ cat Dockerfile 
FROM docker.io/library/python:3.11.9
RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg
  1. Build it
$ podman build .
STEP 1/2: FROM docker.io/library/python:3.11.9
STEP 2/2: RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg
Collecting presidio_analyzer
  Downloading presidio_analyzer-2.2.354-py3-none-any.whl.metadata (2.6 kB)
Collecting spacy<4.0.0,>=3.4.4 (from presidio_analyzer)
  Downloading spacy-3.7.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)
<CUT_FOR_BREVITY>
  1. Observe the error towards the end of the build (NOTE: the warning about running as root and the venv is noise as this is in a container)
Installing collected packages: pycryptodome, presidio_anonymizer
Successfully installed presidio_anonymizer-2.2.354 pycryptodome-3.20.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 148, in _get_module_details
  File "<frozen runpy>", line 112, in _get_module_details
  File "/usr/local/lib/python3.11/site-packages/spacy/__init__.py", line 6, in <module>
    from .errors import setup_default_warnings
  File "/usr/local/lib/python3.11/site-packages/spacy/errors.py", line 3, in <module>
    from .compat import Literal
  File "/usr/local/lib/python3.11/site-packages/spacy/compat.py", line 39, in <module>
    from thinc.api import Optimizer  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/thinc/api.py", line 1, in <module>
    from .backends import (
  File "/usr/local/lib/python3.11/site-packages/thinc/backends/__init__.py", line 17, in <module>
    from .cupy_ops import CupyOps
  File "/usr/local/lib/python3.11/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
    from .numpy_ops import NumpyOps
  File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Error: building at STEP "RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg": while running runtime: exit status 1

Note: Using Stanza instead of spaCy we are able to successfully build the container (install the libraries), but we hit the same error as soon as we try to use the library, eg:

from presidio_analyzer import AnalyzerEngine

Will trigger the same error

Expected behavior Able to install the library, import it and run the demo code (https://microsoft.github.io/presidio/getting_started/)

Screenshots N/A

Additional context Looking at the official Docker image, it seems 3.9 is being used

$ podman run --rm -it mcr.microsoft.com/presidio-analyzer bash
root@d9fed78f0a52:/usr/bin/presidio-analyzer# python -V
Python 3.9.19

Trying to build against this exact version of Python yields the same error

dushankw avatar Jun 17 '24 10:06 dushankw

We began seeing this issue in the past day or two as well. Following.

codingbandit avatar Jun 18 '24 15:06 codingbandit

Thanks for posting. Looks like an issue been spacy and numpy. Consider trying to pip install numpy as well

omri374 avatar Jun 18 '24 20:06 omri374

A new version of numpy got released two days ago. I had some luck pinning numpy==1.26.4.

JosephCatrambone avatar Jun 18 '24 21:06 JosephCatrambone

Root cause: https://github.com/explosion/thinc/issues/939

omri374 avatar Jun 19 '24 06:06 omri374

https://stackoverflow.com/questions/78650222/valueerror-numpy-dtype-size-changed-may-indicate-binary-incompatibility-expec

RubTalha avatar Jul 08 '24 19:07 RubTalha