tokenizers
tokenizers copied to clipboard
wrong architecture `tokenizers.cpython-39-darwin.so` (x86_64) when installing on apple silicon (arm64)
Hey there,
I just wanted to share an issue I came by when trying to get the transformers quick tour example working on my machine.
It seems like, currently, installing tokenizers
via pypi builds or bundles the tokenizers.cpython-39-darwin.so
automatically for x86_64
instead of arm64
for users with apple silicon m1 computers.
System Info: Macbook Air M1 2020 with Mac OS 11.0.1
To reproduce:
-
create virtualenv
virtualenv venv-bad
and activate itsource venv-bad/bin/activate
-
install pytorch (easiest way i've found so far on arm64 is to install nightly via
pip
)pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
-
install transformers (tokenizers will be installed as a dependency)
pip install transformers
-
create a file with quick tour example:
main.py
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('We are very happy to show you the π€ Transformers library.')
- try running quick tour example
Results in error:
ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found. Did find:
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Full stacktrace
(venv-bad) khuynh@kmba:test βΉmain*βΊ$ python main.py Traceback (most recent call last): File "/Users/khuynh/me/test/temp.py", line 5, infrom transformers import pipeline File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2709, in __getattr__ return super().__getattr__(name) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/file_utils.py", line 1821, in __getattr__ module = self._get_module(self._class_to_module[name]) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2703, in _get_module return importlib.import_module("." + module_name, self.__name__) File "/opt/homebrew/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 25, in from ..models.auto.configuration_auto import AutoConfig File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/__init__.py", line 19, in from . import ( File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/__init__.py", line 23, in from .tokenization_layoutlm import LayoutLMTokenizer File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/tokenization_layoutlm.py", line 19, in from ..bert.tokenization_bert import BertTokenizer File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 23, in from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 26, in from .tokenization_utils_base import ( File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 69, in from tokenizers import AddedToken File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/__init__.py", line 79, in from .tokenizers import ( ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found. Did find: /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Looking at the architecture of the shared lib using find
, we can see it's a dynamically linked x86_64 library
(venv-bad) khuynh@kmba:test βΉmain*βΊ$ file /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64
Solution:
The solution I found requires installing the rust toolchain on your machine and installing the tokenizers
module from source so I think this is best as a temporary solution. I already have the rust nightly toolchain installed on my machine, so that's what I used. Otherwise, instructions for installing are here.
- clone
tokenizers
git clone [email protected]:huggingface/tokenizers.git
-
cd tokenizers/bindings/python
- install tokenizers,
python setup.py install
- now go back and successfully re-run the transformers quick tour
We can also now see that the shared library is the proper architecture using file
:
(venv-bad) khuynh@kmba:test βΉmain*βΊ$ file /Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library arm64
I'm not super well versed in setuptools, so I'm not sure best way to fix this. Maybe release a different pre-built shared tokenizers.cpython-39-darwin.so
for arm64
users? I'd be happy to help if needed.
Hi @hkennyv and thank you for reporting this.
We don't build wheels for Apple Silicon at the moment because there is no environment for this on our Github CI. (cf https://github.com/actions/virtual-environments/issues/2187). The only way to have it working is, as you mentioned, to build it yourself. We'll add support for this as soon as it is available!
@n1t0 thanks for the response & explanation! i've +1'd the issue you linked to (hopefully) help :)
Hi there !
I've manually build binaries for tokenizers on arm m1 and released them for tokenizers 0.11.6
.
We'll try our best to keep building those by hand while waiting for https://github.com/actions/runner/issues/805.
Expect some delay between normal releases and m1 releases for now :)
Have a great day !
I followed the manual build instructions from the solution of the original comment, but am getting the error
RuntimeError: Failed to import transformers.models.camembert.configuration_camembert because of the following error (look up to see its traceback): partially initialized module 'tokenizers.pre_tokenizers' has no attribute 'PreTokenizer' (most likely due to a circular import)
I am trying to run AutoModelForTokenClassification
Hi @McPatate, thanks for building the bindings manually! Two months after your post, there was an announcement about pre-release version of the macOS-ARM64 runner. Will it make things easier?
@n1t0 you can also track the recent roadmap issue github/roadmap#528.
I'm having the same issue. After running:
pip install tokenizers
My machine builds the wheel, but for some reason it's always x86_64 architecture
I'm installing the latest version, should I try an earlier one?
Full output:
Collecting tokenizers Downloading tokenizers-0.12.1.tar.gz (220 kB) ββββββββββββββββββββββββββββββββββββββββ 220.7/220.7 kB 1.4 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... done Created wheel for tokenizers: filename=tokenizers-0.12.1-cp310-cp310-macosx_12_0_arm64.whl size=3760213 sha256=885cf11eb9f1fbd1a6be3366f2d5d8a7591890b96ed84a3121cc6bcd66be938a Stored in directory: /private/var/folders/k_/szxh8w4n0hl32b_j8dkxl76h0000gn/T/pip-ephem-wheel-cache-8p1jggwq/wheels/bd/22/bc/fa8337ce1ccf384c8fc4c1dbfa9cb1687934c0f24719082d49 Successfully built tokenizers Installing collected packages: tokenizers Successfully installed tokenizers-0.12.1 (ldm) alexandrecarqueja@MacBook-Pro stable-diffusion % file /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64
@WALEX2000 I'm not sure we have arm binaries for 0.12.1
, we've been working on the CI with self-hosted runners but I'm unsure where we're at atm.
Maybe @Narsil can chime in :)
I have followed the instructions to build from source, and I still see the library be x86_64 compiled.
I cloned the repo, made sure the Python environment is configured for shared library, and ran python setup.py install
.
tokenizers
was installed in the virtual environment.
Ran the following command to check the built compiled lib.
file .venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so
Output:
.venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64
I do not understand why it is not compiling for the correct target.
I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.
I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.
I ran into this as well. It turned out that I was using the brew installed rust rather than the rustup one. Try which rustc
to make sure it is coming from the ~/.cargo directory.
@spullara I did. It was the rustup one and not Brew.
It may also be defaulting to the wrong toolchain. You might also try setting the default toolchain with
rustup default stable-aarch64-apple-darwin
I think I also had to delete rust-toolchain as when it was present it would change to the x86_64 toolchain. You can check to make sure the right one is selected with
rustup toolchain list
Edit: I was able to fix the rust-toolchain issue by doing
rustup set default-host aarch64-apple-darwin
@spullara I ran rustup toolchain list
and the output is as follows:
stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin
tokenizers==0.13.0
should now be built automatically for M1.
The errors you are seeing are super odd indeed, are you running into some sort of compatibility mode ? I asked around other users using M1 and no one had the issue you were seeing :(
Could you try and check the rust install is OK by running cargo test
within tokenizers/tokenizers/
directory for instance ?
@spullara I ran
rustup toolchain list
and the output is as follows:stable-aarch64-apple-darwin (default) stable-x86_64-apple-darwin
Did you run this in the tokenizers/bindings/python directory?
I get this when I run it in tokenizers/bindings/python
:
stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin (override)
I get this when I run it in
tokenizers/bindings/python
:stable-aarch64-apple-darwin (default) stable-x86_64-apple-darwin (override)
That means you need to this command I had to do to change the default host:
rustup set default-host aarch64-apple-darwin
Thanks.
@hkennyv thank you so much for this! It's July 2023, and following your instructions for the tokenizers (and the same thing for safetensors) was the only way I could get the huggingface dependencies I needed all running.
Does anyone know if there's a better way yet that I couldn't find?
You're running a too old Python version (or too new). ThatΕ the only reason for needing to build from source, everything else should be prebuilt.