spaCy
spaCy copied to clipboard
bus error upon existing the program after using spacy on mac M1
How to reproduce the behaviour
On M1 any code, which uses spacy to parse a doc failing with (can only test on my laptop) Works fine on linux machine
[1] 73089 bus error
upon exit. On both sm and trf models
Your Environment
Info about spaCy
- spaCy version: 3.7.2
- Platform: macOS-14.1.2-arm64-arm-64bit
- Python version: 3.11.3
- Pipelines: en_core_web_sm (3.7.0), en_core_web_md (3.7.1), en_core_web_trf (3.7.2), en_core_web_lg (3.7.0)
Here is some binary tb info
https://gist.github.com/koder-ua/8fd3e3fd795674b01d1ddbeda9400999
Thanks for the report!
The info provided makes this look specific to the trf
model, in particular curated-tokenizers
. If you have a minute, could you create a new venv without installing torch and with only the en_core_web_sm
model and see if you still get the same error?
@adrianeboyd yep, seems like you right on clean python3.11 with only spacy & en_core_web_sm installed all works fine
python3.11 with only spacy and en_core_web_sm
~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
~
python3.11 with pytorch & co
✗ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
[1] 54694 bus error python -c
Yet just installing trf model (which also installs torhc & co) did not cause the issue to appear:
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("some text")'
(python311_clean) ➜ ~
If you also install sentencepiece
in the new venv?
All fine
(python311_clean) ➜ ~ pip install sentencepiece
Collecting sentencepiece
Downloading sentencepiece-0.1.99-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 19.0 MB/s eta 0:00:00
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("I have some text")'
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("I have some text")'
(python311_clean) ➜ ~
In general this seems to be a known issue related to sentencepiece
, which is vendored in curated-tokenizers
. I'm not currently sure exactly which conditions are necessary for you to run into it in practice, though.
I think this is the same issue as https://github.com/google/sentencepiece/issues/579 . I am not sure though why the sentencepiece library is loaded. We link sentencepiece statically.
At any rate, the error comes from destructing absl::Flag
. However absl:Flag
is not needed for library-use of sentencepiece, but tends to creep back in as a dependency. I'll see if we can remove it in curated-tokenizers
, which should avoid conflicts between different versions of sentencepiece.