tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

wrong architecture `tokenizers.cpython-39-darwin.so` (x86_64) when installing on apple silicon (arm64)

Open hkennyv opened this issue 3 years ago β€’ 9 comments

Hey there,

I just wanted to share an issue I came by when trying to get the transformers quick tour example working on my machine.

It seems like, currently, installing tokenizers via pypi builds or bundles the tokenizers.cpython-39-darwin.so automatically for x86_64 instead of arm64 for users with apple silicon m1 computers.

System Info: Macbook Air M1 2020 with Mac OS 11.0.1

To reproduce:

  1. create virtualenv virtualenv venv-bad and activate it source venv-bad/bin/activate

  2. install pytorch (easiest way i've found so far on arm64 is to install nightly via pip) pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

  3. install transformers (tokenizers will be installed as a dependency) pip install transformers

  4. create a file with quick tour example:

main.py

from transformers import pipeline
classifier = pipeline('sentiment-analysis')

classifier('We are very happy to show you the πŸ€— Transformers library.')
  1. try running quick tour example

Results in error:

ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found.  Did find:
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
Full stacktrace
(venv-bad) khuynh@kmba:test β€Ήmain*β€Ί$ python main.py
Traceback (most recent call last):
  File "/Users/khuynh/me/test/temp.py", line 5, in 
    from transformers import pipeline
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2709, in __getattr__
    return super().__getattr__(name)
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/file_utils.py", line 1821, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/__init__.py", line 2703, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/opt/homebrew/Cellar/[email protected]/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 25, in 
    from ..models.auto.configuration_auto import AutoConfig
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/__init__.py", line 19, in 
    from . import (
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/__init__.py", line 23, in 
    from .tokenization_layoutlm import LayoutLMTokenizer
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/layoutlm/tokenization_layoutlm.py", line 19, in 
    from ..bert.tokenization_bert import BertTokenizer
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert.py", line 23, in 
    from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 26, in 
    from .tokenization_utils_base import (
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 69, in 
    from tokenizers import AddedToken
  File "/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/__init__.py", line 79, in 
    from .tokenizers import (
ImportError: dlopen(/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so, 2): no suitable image found.  Did find:
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture
        /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: mach-o, but wrong architecture

Looking at the architecture of the shared lib using find, we can see it's a dynamically linked x86_64 library

(venv-bad) khuynh@kmba:test β€Ήmain*β€Ί$ file /Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv-bad/lib/python3.9/site-packages/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

Solution:

The solution I found requires installing the rust toolchain on your machine and installing the tokenizers module from source so I think this is best as a temporary solution. I already have the rust nightly toolchain installed on my machine, so that's what I used. Otherwise, instructions for installing are here.

  1. clone tokenizers
git clone [email protected]:huggingface/tokenizers.git
  1. cd tokenizers/bindings/python
  2. install tokenizers, python setup.py install
  3. now go back and successfully re-run the transformers quick tour

We can also now see that the shared library is the proper architecture using file:

(venv-bad) khuynh@kmba:test β€Ήmain*β€Ί$ file /Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so
/Users/khuynh/me/test/venv2/lib/python3.9/site-packages/tokenizers-0.10.2-py3.9-macosx-11-arm64.egg/tokenizers/tokenizers.cpython-39-darwin.so: Mach-O 64-bit dynamically linked shared library arm64

I'm not super well versed in setuptools, so I'm not sure best way to fix this. Maybe release a different pre-built shared tokenizers.cpython-39-darwin.so for arm64 users? I'd be happy to help if needed.

hkennyv avatar May 24 '21 02:05 hkennyv

Hi @hkennyv and thank you for reporting this.

We don't build wheels for Apple Silicon at the moment because there is no environment for this on our Github CI. (cf https://github.com/actions/virtual-environments/issues/2187). The only way to have it working is, as you mentioned, to build it yourself. We'll add support for this as soon as it is available!

n1t0 avatar May 24 '21 19:05 n1t0

@n1t0 thanks for the response & explanation! i've +1'd the issue you linked to (hopefully) help :)

hkennyv avatar May 25 '21 01:05 hkennyv

Hi there !

I've manually build binaries for tokenizers on arm m1 and released them for tokenizers 0.11.6.

We'll try our best to keep building those by hand while waiting for https://github.com/actions/runner/issues/805.

Expect some delay between normal releases and m1 releases for now :)

Have a great day !

McPatate avatar Mar 01 '22 17:03 McPatate

I followed the manual build instructions from the solution of the original comment, but am getting the error

RuntimeError: Failed to import transformers.models.camembert.configuration_camembert because of the following error (look up to see its traceback): partially initialized module 'tokenizers.pre_tokenizers' has no attribute 'PreTokenizer' (most likely due to a circular import)

I am trying to run AutoModelForTokenClassification

etan18 avatar Jun 30 '22 16:06 etan18

Hi @McPatate, thanks for building the bindings manually! Two months after your post, there was an announcement about pre-release version of the macOS-ARM64 runner. Will it make things easier?

@n1t0 you can also track the recent roadmap issue github/roadmap#528.

vi3itor avatar Jul 01 '22 05:07 vi3itor

I'm having the same issue. After running: pip install tokenizers My machine builds the wheel, but for some reason it's always x86_64 architecture I'm installing the latest version, should I try an earlier one?

Full output: Collecting tokenizers Downloading tokenizers-0.12.1.tar.gz (220 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 220.7/220.7 kB 1.4 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... done Created wheel for tokenizers: filename=tokenizers-0.12.1-cp310-cp310-macosx_12_0_arm64.whl size=3760213 sha256=885cf11eb9f1fbd1a6be3366f2d5d8a7591890b96ed84a3121cc6bcd66be938a Stored in directory: /private/var/folders/k_/szxh8w4n0hl32b_j8dkxl76h0000gn/T/pip-ephem-wheel-cache-8p1jggwq/wheels/bd/22/bc/fa8337ce1ccf384c8fc4c1dbfa9cb1687934c0f24719082d49 Successfully built tokenizers Installing collected packages: tokenizers Successfully installed tokenizers-0.12.1 (ldm) alexandrecarqueja@MacBook-Pro stable-diffusion % file /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so /opt/miniconda3/envs/ldm/lib/python3.10/site-packages/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

WALEX2000 avatar Sep 14 '22 18:09 WALEX2000

@WALEX2000 I'm not sure we have arm binaries for 0.12.1, we've been working on the CI with self-hosted runners but I'm unsure where we're at atm.

Maybe @Narsil can chime in :)

McPatate avatar Sep 15 '22 09:09 McPatate

I have followed the instructions to build from source, and I still see the library be x86_64 compiled.

I cloned the repo, made sure the Python environment is configured for shared library, and ran python setup.py install.

tokenizers was installed in the virtual environment.

Ran the following command to check the built compiled lib.

file .venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so

Output:

.venv/lib/python3.10/site-packages/tokenizers-0.13.0.dev0-py3.10-macosx-12.2-arm64.egg/tokenizers/tokenizers.cpython-310-darwin.so: Mach-O 64-bit dynamically linked shared library x86_64

I do not understand why it is not compiling for the correct target.

I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.

thetonus avatar Sep 20 '22 16:09 thetonus

I am on a M1 Macbook pro. Python version is 3.10.7. Cargo version is 1.63.0.

I ran into this as well. It turned out that I was using the brew installed rust rather than the rustup one. Try which rustc to make sure it is coming from the ~/.cargo directory.

spullara avatar Sep 21 '22 20:09 spullara

@spullara I did. It was the rustup one and not Brew.

thetonus avatar Sep 23 '22 16:09 thetonus

It may also be defaulting to the wrong toolchain. You might also try setting the default toolchain with

rustup default stable-aarch64-apple-darwin

I think I also had to delete rust-toolchain as when it was present it would change to the x86_64 toolchain. You can check to make sure the right one is selected with

rustup toolchain list

Edit: I was able to fix the rust-toolchain issue by doing

rustup set default-host aarch64-apple-darwin

spullara avatar Sep 23 '22 17:09 spullara

@spullara I ran rustup toolchain list and the output is as follows:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin

thetonus avatar Sep 26 '22 14:09 thetonus

tokenizers==0.13.0 should now be built automatically for M1.

The errors you are seeing are super odd indeed, are you running into some sort of compatibility mode ? I asked around other users using M1 and no one had the issue you were seeing :(

Could you try and check the rust install is OK by running cargo test within tokenizers/tokenizers/ directory for instance ?

Narsil avatar Sep 27 '22 15:09 Narsil

@spullara I ran rustup toolchain list and the output is as follows:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin

Did you run this in the tokenizers/bindings/python directory?

spullara avatar Sep 27 '22 18:09 spullara

I get this when I run it in tokenizers/bindings/python:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin (override)

thetonus avatar Oct 04 '22 19:10 thetonus

I get this when I run it in tokenizers/bindings/python:

stable-aarch64-apple-darwin (default)
stable-x86_64-apple-darwin (override)

That means you need to this command I had to do to change the default host:

rustup set default-host aarch64-apple-darwin

spullara avatar Oct 05 '22 06:10 spullara

Thanks.

thetonus avatar Oct 05 '22 15:10 thetonus

@hkennyv thank you so much for this! It's July 2023, and following your instructions for the tokenizers (and the same thing for safetensors) was the only way I could get the huggingface dependencies I needed all running.

Does anyone know if there's a better way yet that I couldn't find?

bolducp avatar Jul 21 '23 15:07 bolducp

You're running a too old Python version (or too new). ThatΕ› the only reason for needing to build from source, everything else should be prebuilt.

Narsil avatar Jul 25 '23 07:07 Narsil