spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

bus error upon existing the program after using spacy on mac M1

Open koder-ua opened this issue 1 year ago • 7 comments

How to reproduce the behaviour

On M1 any code, which uses spacy to parse a doc failing with (can only test on my laptop) Works fine on linux machine

[1] 73089 bus error

upon exit. On both sm and trf models

Your Environment

Info about spaCy

  • spaCy version: 3.7.2
  • Platform: macOS-14.1.2-arm64-arm-64bit
  • Python version: 3.11.3
  • Pipelines: en_core_web_sm (3.7.0), en_core_web_md (3.7.1), en_core_web_trf (3.7.2), en_core_web_lg (3.7.0)

koder-ua avatar Dec 19 '23 22:12 koder-ua

Here is some binary tb info

https://gist.github.com/koder-ua/8fd3e3fd795674b01d1ddbeda9400999

koder-ua avatar Dec 19 '23 22:12 koder-ua

Thanks for the report!

The info provided makes this look specific to the trf model, in particular curated-tokenizers. If you have a minute, could you create a new venv without installing torch and with only the en_core_web_sm model and see if you still get the same error?

adrianeboyd avatar Dec 20 '23 12:12 adrianeboyd

@adrianeboyd yep, seems like you right on clean python3.11 with only spacy & en_core_web_sm installed all works fine

python3.11 with only spacy and en_core_web_sm

~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
~

python3.11 with pytorch & co

✗ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
[1]    54694 bus error  python -c

Yet just installing trf model (which also installs torhc & co) did not cause the issue to appear:

(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("some text")'
(python311_clean) ➜  ~

koder-ua avatar Dec 20 '23 13:12 koder-ua

If you also install sentencepiece in the new venv?

adrianeboyd avatar Dec 20 '23 14:12 adrianeboyd

All fine

(python311_clean) ➜  ~ pip install sentencepiece
Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 19.0 MB/s eta 0:00:00
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("I have some text")'
(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("I have some text")'
(python311_clean) ➜  ~

koder-ua avatar Dec 20 '23 14:12 koder-ua

In general this seems to be a known issue related to sentencepiece, which is vendored in curated-tokenizers. I'm not currently sure exactly which conditions are necessary for you to run into it in practice, though.

adrianeboyd avatar Dec 21 '23 10:12 adrianeboyd

I think this is the same issue as https://github.com/google/sentencepiece/issues/579 . I am not sure though why the sentencepiece library is loaded. We link sentencepiece statically.

At any rate, the error comes from destructing absl::Flag. However absl:Flag is not needed for library-use of sentencepiece, but tends to creep back in as a dependency. I'll see if we can remove it in curated-tokenizers, which should avoid conflicts between different versions of sentencepiece.

danieldk avatar Jan 23 '24 09:01 danieldk