wrong architecture `tokenizers.cpython-39-darwin.so` (x86_64) when installing on apple silicon (arm64)
Hey there,
I just wanted to share an issue I came by when trying to get the transformers quick tour example working on my machine.
It seems like, currently, installing tokenizers via pypi builds or bundles the tokenizers.cpython-39-darwin.so automatically for x86_64 instead of arm64 for users with apple silicon m1 computers.
System Info: Macbook Air M1 2020 with Mac OS 11.0.1
To reproduce:
-
create virtualenv
virtualenv venv-badand activate itsource venv-bad/bin/activate -
install pytorch (easiest way i've found so far on arm64 is to install nightly via
pip)pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html -
install transformers (tokenizers will be installed as a dependency)
pip install transformers -
create a file with quick tour example:
main.py
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('We are very happy to show you the π€ Transformers library.')
- try running quick tour example
Results in error:
ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found. Did find:
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Full stacktrace
(venv-bad) khuynh@kmba:test βΉmain*βΊ$ python main.py Traceback (most recent call last): File "/Users/khuynh/me/test/temp.py", line 5, infrom transformers import pipeline File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2709, in __getattr__ return super().__getattr__(name) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/file_utils.py", line 1821, in __getattr__ module = self._get_module(self._class_to_module[name]) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2703, in _get_module return importlib.import_module("." + module_name, self.__name__) File "/opt/homebrew/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 25, in from ..models.auto.configuration_auto import AutoConfig File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/__init__.py", line 19, in from . import ( File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/__init__.py", line 23, in from .tokenization_layoutlm import LayoutLMTokenizer File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/tokenization_layoutlm.py", line 19, in from ..bert.tokenization_bert import BertTokenizer File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 23, in from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 26, in from .tokenization_utils_base import ( File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 69, in from tokenizers import AddedToken File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/__init__.py", line 79, in from .tokenizers import ( ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found. Did find: /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Looking at the architecture of the shared lib using find, we can see it's a dynamically linked x86_64 library
(venv-bad) khuynh@kmba:test βΉmain*βΊ$ file /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64
Solution:
The solution I found requires installing the rust toolchain on your machine and installing the tokenizers module from source so I think this is best as a temporary solution. I already have the rust nightly toolchain installed on my machine, so that's what I used. Otherwise, instructions for installing are here.
- clone
tokenizers
git clone [email protected]:huggingface/tokenizers.git
cd tokenizers/bindings/python- install tokenizers,
python setup.py install - now go back and successfully re-run the transformers quick tour
We can also now see that the shared library is the proper architecture using file:
(venv-bad) khuynh@kmba:test βΉmain*βΊ$ file /Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library arm64
I'm not super well versed in setuptools, so I'm not sure best way to fix this. Maybe release a different pre-built shared tokenizers.cpython-39-darwin.so for arm64 users? I'd be happy to help if needed.
Hi @hkennyv and thank you for reporting this.
We don't build wheels for Apple Silicon at the moment because there is no environment for this on our Github CI. (cf https://github.com/actions/virtual-environments/issues/2187). The only way to have it working is, as you mentioned, to build it yourself. We'll add support for this as soon as it is available!
@n1t0 thanks for the response & explanation! i've +1'd the issue you linked to (hopefully) help :)
Hi there !
I've manually build binaries for tokenizers on arm m1 and released them for tokenizers 0.11.6.
We'll try our best to keep building those by hand while waiting for https://github.com/actions/runner/issues/805.
Expect some delay between normal releases and m1 releases for now :)
Have a great day !
I followed the manual build instructions from the solution of the original comment, but am getting the error
RuntimeError: Failed to import transformers.models.camembert.configuration_camembert because of the following error (look up to see its traceback): partially initialized module 'tokenizers.pre_tokenizers' has no attribute 'PreTokenizer' (most likely due to a circular import)
I am trying to run AutoModelForTokenClassification
Hi @McPatate, thanks for building the bindings manually! Two months after your post, there was an announcement about pre-release version of the macOS-ARM64 runner. Will it make things easier?
@n1t0 you can also track the recent roadmap issue github/roadmap#528.
I'm having the same issue. After running:
pip install tokenizers
My machine builds the wheel, but for some reason it's always x86_64 architecture
I'm installing the latest version, should I try an earlier one?
Full output:
Collecting tokenizers Downloading tokenizers-0.12.1.tar.gz (220 kB) ββββββββββββββββββββββββββββββββββββββββ 220.7/220.7 kB 1.4 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... done Created wheel for tokenizers: filename=tokenizers-0.12.1-cp310-cp310-macosx_12_0_arm64.whl size=3760213 sha256=885cf11eb9f1fbd1a6be3366f2d5d8a7591890b96ed84a3121cc6bcd66be938a Stored in directory: /private/var/folders/k_/szxh8w4n0hl32b_j8dkxl76h0000gn/T/pip-ephem-wheel-cache-8p1jggwq/wheels/bd/22/bc/fa8337ce1ccf384c8fc4c1dbfa9cb1687934c0f24719082d49 Successfully built tokenizers Installing collected packages: tokenizers Successfully installed tokenizers-0.12.1 (ldm) alexandrecarqueja@MacBook-Pro stable-diffusion % file /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64
@WALEX2000 I'm not sure we have arm binaries for 0.12.1, we've been working on the CI with self-hosted runners but I'm unsure where we're at atm.
Maybe @Narsil can chime in :)
I have followed the instructions to build from source, and I still see the library be x86_64 compiled.
I cloned the repo, made sure the Python environment is configured for shared library, and ran python setup.py install.
tokenizers was installed in the virtual environment.
Ran the following command to check the built compiled lib.
file .venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so
Output:
.venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64
I do not understand why it is not compiling for the correct target.
I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.
I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.
I ran into this as well. It turned out that I was using the brew installed rust rather than the rustup one. Try which rustc to make sure it is coming from the ~/.cargo directory.
@spullara I did. It was the rustup one and not Brew.
It may also be defaulting to the wrong toolchain. You might also try setting the default toolchain with
rustup default stable-aarch64-apple-darwin
I think I also had to delete rust-toolchain as when it was present it would change to the x86_64 toolchain. You can check to make sure the right one is selected with
rustup toolchain list
Edit: I was able to fix the rust-toolchain issue by doing
rustup set default-host aarch64-apple-darwin
@spullara I ran rustup toolchain list and the output is as follows:
stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin
tokenizers==0.13.0 should now be built automatically for M1.
The errors you are seeing are super odd indeed, are you running into some sort of compatibility mode ? I asked around other users using M1 and no one had the issue you were seeing :(
Could you try and check the rust install is OK by running cargo test within tokenizers/tokenizers/ directory for instance ?
@spullara I ran
rustup toolchain listand the output is as follows:stable-aarch64-apple-darwin (default) stable-x86_64-apple-darwin
Did you run this in the tokenizers/bindings/python directory?
I get this when I run it in tokenizers/bindings/python:
stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin (override)
I get this when I run it in
tokenizers/bindings/python:stable-aarch64-apple-darwin (default) stable-x86_64-apple-darwin (override)
That means you need to this command I had to do to change the default host:
rustup set default-host aarch64-apple-darwin
Thanks.
@hkennyv thank you so much for this! It's July 2023, and following your instructions for the tokenizers (and the same thing for safetensors) was the only way I could get the huggingface dependencies I needed all running.
Does anyone know if there's a better way yet that I couldn't find?
You're running a too old Python version (or too new). ThatΕ the only reason for needing to build from source, everything else should be prebuilt.