tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

ERROR: Failed building wheel for tokenizers

Open outdoorblake opened this issue 1 year ago • 61 comments

System Info

I can't seem to get past this error "ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects" when installing transformers with pip. An ML friend of mine also tried on their own instance and encountered the same problem, tried to help troubleshoot with me and we weren't able to move past so I think its possibly a recent issue.

I am following the transformers README install instructions step by step, with a venv and pytorch ready to go. Pip is also fully up to date. In this error output one prompt it says is to possibly install a rust compiler - but we both felt this doesn't seem like the right next step because it usually isn't required when installing the transformers package and the README has no mention of needing to install a rust compiler.

Thanks in advance! -Blake

Full output below:

command: pip install transformers

Collecting transformers Using cached transformers-4.21.1-py3-none-any.whl (4.7 MB) Requirement already satisfied: tqdm>=4.27 in ./venv/lib/python3.9/site-packages (from transformers) (4.64.0) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./venv/lib/python3.9/site-packages (from transformers) (0.9.0) Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.9/site-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in ./venv/lib/python3.9/site-packages (from transformers) (2022.8.17) Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 Using cached tokenizers-0.12.1.tar.gz (220 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.17 in ./venv/lib/python3.9/site-packages (from transformers) (1.23.2) Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.9/site-packages (from transformers) (21.3) Requirement already satisfied: filelock in ./venv/lib/python3.9/site-packages (from transformers) (3.8.0) Requirement already satisfied: requests in ./venv/lib/python3.9/site-packages (from transformers) (2.26.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./venv/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (4.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ./venv/lib/python3.9/site-packages (from packaging>=20.0->transformers) (3.0.9) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./venv/lib/python3.9/site-packages (from requests->transformers) (1.26.7) Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.9/site-packages (from requests->transformers) (3.3) Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2.0.7) Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [51 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.macosx-12-arm64-cpython-39 creating build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/models/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models creating build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/decoders/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders creating build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/normalizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/pre_tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/processors/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors creating build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/trainers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations creating build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/visualizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/models/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/decoders/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/normalizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/pre_tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/processors/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/trainers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools running build_ext running build_rust error: can't find Rust compiler

  If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
  
  To update pip, run:
  
      pip install --upgrade pip
  
  and then retry package installation.
  
  If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

command: pip install transformers

Collecting transformers Using cached transformers-4.21.1-py3-none-any.whl (4.7 MB) Requirement already satisfied: tqdm>=4.27 in ./venv/lib/python3.9/site-packages (from transformers) (4.64.0) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./venv/lib/python3.9/site-packages (from transformers) (0.9.0) Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.9/site-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in ./venv/lib/python3.9/site-packages (from transformers) (2022.8.17) Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 Using cached tokenizers-0.12.1.tar.gz (220 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.17 in ./venv/lib/python3.9/site-packages (from transformers) (1.23.2) Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.9/site-packages (from transformers) (21.3) Requirement already satisfied: filelock in ./venv/lib/python3.9/site-packages (from transformers) (3.8.0) Requirement already satisfied: requests in ./venv/lib/python3.9/site-packages (from transformers) (2.26.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./venv/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (4.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ./venv/lib/python3.9/site-packages (from packaging>=20.0->transformers) (3.0.9) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./venv/lib/python3.9/site-packages (from requests->transformers) (1.26.7) Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.9/site-packages (from requests->transformers) (3.3) Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2.0.7) Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [51 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.macosx-12-arm64-cpython-39 creating build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/models/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models creating build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/decoders/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders creating build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/normalizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/pre_tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/processors/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors creating build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/trainers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations creating build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/visualizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/models/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/decoders/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/normalizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/pre_tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/processors/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/trainers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools running build_ext running build_rust error: can't find Rust compiler

  If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
  
  To update pip, run:
  
      pip install --upgrade pip
  
  and then retry package installation.
  
  If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Expected behavior

I would expect transformers library to install without throwing an error when all pre-requisites for installation are met.

outdoorblake avatar Aug 23 '22 23:08 outdoorblake

I am aware of this past issue - it is very similar but these suggested fixes seem dated and are not working.

outdoorblake avatar Aug 23 '22 23:08 outdoorblake

Let me move this over to tokenizers, which should be in a better position to help.

LysandreJik avatar Aug 24 '22 10:08 LysandreJik

also having this issue, hadn't ran into it before.

erik-dunteman avatar Aug 24 '22 16:08 erik-dunteman

Are you guys on M1 ? If that's the case it's expected unfortunately. (https://github.com/huggingface/tokenizers/issues/932)

If not, what platform are you on ? (OS, hardware, python version ?)

Basically for M1 you need to install from source (for now, fixes coming soon https://github.com/huggingface/tokenizers/pull/1055).

Also the error message says you're missing a rust compiler, it might be enough to just install the rust compiler: https://www.rust-lang.org/tools/install and maybe the install will go through. (It's easier if we prebuild those but still).

Narsil avatar Aug 25 '22 08:08 Narsil

I'm using M2 Apple and I can't install tokenizers. The same thing works on my Linux fine. How can we install tokenizers for M1/M2 Apple Macs?

alibrahimzada avatar Aug 25 '22 16:08 alibrahimzada

M1 user here. I got the same error, installing the rust compiler fixed this for me.

stephantul avatar Aug 26 '22 05:08 stephantul

It's all the same, we couldn't prebuild the library for m1 (which is an arm64 chip) because github didn't have a arm64 action runner. We did push manually some prebuilt binaries but it seems they contained some issues. Since then, github enabled the runner to work on m1 machines (so all macos+arm64) so hopefully this will be fixed for the next release.

Since this is a "major" release (still not in 1.0) we're going to do a full sweep of slow tests in transformers (which is our biggest user) and hopefully this should come out of the box for m1 onwards after that !

Narsil avatar Aug 26 '22 08:08 Narsil

@stephantul where did you get the rust compiler. I installed it from https://www.rust-lang.org/tools/install and pip3 install tokenizers still fails.

alibrahimzada avatar Aug 26 '22 21:08 alibrahimzada

@alibrahimzada I installed it with homebrew

stephantul avatar Aug 27 '22 04:08 stephantul

@alibrahimzada you might also need pip install setuptools_rust and your python environment needs to be shareable (depends how you installed python basically, for pyenv for instance you will need this: https://github.com/pyenv/pyenv/issues/392

(Careful it's now PYTHON_CONFIGURE_OPT="--enable-shared" pyenv install ... ).

Narsil avatar Aug 29 '22 15:08 Narsil

Having the same problem and none of the above suggestions worked. Any ETA on when we can expect the next release that fixes this bug?

argonaut76 avatar Sep 13 '22 02:09 argonaut76

I am on M1 and managed to go around this in the following way: I installed a rust compiler using brew, and then initialized it. brew install rustup rustup-init Then I restarted the console and checked if it is installed: rustc --version . It turned out you also have to setup the path: export PATH="$HOME/.cargo/bin:$PATH"

anibzlv avatar Sep 20 '22 12:09 anibzlv

I have done everything @Narsil and @anibzlv have suggested. No luck still.. (am on M1, 2021)

prabu-ssb avatar Sep 22 '22 12:09 prabu-ssb

Oddly enough, the library works just fine inside a virtual environment on my MBP with the M1 chip. So for now, that's my approach.

argonaut76 avatar Sep 22 '22 13:09 argonaut76

I could install from sources and it seems to be working.

prabu-ssb avatar Sep 22 '22 13:09 prabu-ssb

Has anyone tried to install the latest version on M1 ? The prebuilt binaries should be released now !

Narsil avatar Sep 27 '22 14:09 Narsil

I tried installing on M1 just now in a python3.10 virtual environment. All I had to do was pip install setuptools_rust. Then I could install all the required packages.

manyu252 avatar Sep 27 '22 19:09 manyu252

I'm running on M2 with Python3.8 and are still running into this problem Any other workaround than installing from source?

TR-EIP avatar Oct 06 '22 08:10 TR-EIP

I thought Python 3.8 was not built for M1/M2... So this library cannot build it for you.

Are you sure you are not in compatibility mode and not really using 3.8 ? https://stackoverflow.com/questions/69511006/cant-install-pyenv-3-8-5-on-macos-big-sur-with-m1-chip

Narsil avatar Oct 06 '22 10:10 Narsil

Try telling pip to use prefer binary, it'll probably give you an older version of tokenizer but you would need to build from source. It does depend on the version requirements for tokenizer.

The proper fix would be for Hugginface to create wheels for Apple Silicon

Vargol avatar Nov 10 '22 14:11 Vargol

We already build wheels for Apple Silicon ! Just not python3.8 which isn't supposed to exist on M1. (only 3.9, 3.10, and 3.11 now)

Narsil avatar Nov 10 '22 14:11 Narsil

Where's the binary wheel for 0.12.1 , PyPi can't find it. Having to use 11.6 to avoid having "install rust" as an instruction to install user software.

Vargol avatar Nov 10 '22 14:11 Vargol

Github did not provide an action runner at the time for M1, so builds where manual (and infrequent).

Any reason you cannot upgrade to 0.13.2 or 0.12.6 ?

But yes for some older versions the M1 are not present, we're not doing retroactive builds unfortunately. I'm basically the sole maintainer here, and I don't really have the time to figure out all the old versions for all platforms (but ensuring that once a platform is supported it keeps on working is something we're committed to).

Narsil avatar Nov 10 '22 15:11 Narsil

In the project there are a number of other thirty party python modules dependant on tokenizer, from yesterdays build I got the following version dependencies for pip

Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 .

No sure why its not picked in 0.12.6, setting pip to prefer binary installed 0.11.6.

EDIT: answering my own question https://pypi.org/simple/tokenizers/ goes straight from 0.12.1 to 0.13.0 there is no 0.12.6

Vargol avatar Nov 10 '22 15:11 Vargol

Hmm interesting, could you try force installing 0.12.6 and see if that fixes it ?

If you could share your env (Python version + hardware (m1 I guess) + requirements.txt) ?

I don't remember the command but there's a way to make pip explain its decisions regarding versions.

Narsil avatar Nov 10 '22 15:11 Narsil

I got confused with 0.11.6 sorry !

And I don't see the builds for 0.12 for arm, I'm guessing we moved to 0.13 first.

TBH there "shouldn't" by any major differences between 0.12.1 and 0.13, so if you could switch that might work (I took caution since we updated PyO3 bindings version and that triggered a lot of code changes, even if we didn't intend any functional changes).

transformers is the one probably limiting tokenizers (we do that to enable tokenizers to make eventual breaking changes). Maybe you could try updating it ?

Narsil avatar Nov 10 '22 15:11 Narsil

It a bit convoluted ATM as currently on different OS's require different version of gfpgan unless you install torch upfront.

So I do

pip install "torch<1.13" "torchvision<1.14"

Main requirements.txt -r requirements-base.txt

protobuf==3.19.6 torch<1.13.0 torchvision<0.14.0 -e .

requirements-base.txt

pip will resolve the version which matches torch

albumentations dependency_injector==4.40.0 diffusers einops eventlet flask==2.1.3 flask_cors==3.0.10 flask_socketio==5.3.0 flaskwebgui==0.3.7 getpass_asterisk gfpgan huggingface-hub imageio imageio-ffmpeg kornia numpy omegaconf opencv-python pillow pip>=22 pudb pyreadline3 pytorch-lightning==1.7.7 realesrgan scikit-image>=0.19 send2trash streamlit taming-transformers-rom1504 test-tube torch-fidelity torchmetrics transformers==4.21.* git+https://github.com/openai/CLIP.git@main#egg=clip git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k-diffusion git+https://github.com/invoke-ai/clipseg.git@models-rename#egg=clipseg

Vargol avatar Nov 10 '22 15:11 Vargol

I'll have to see why we limit transformers assuming the reasoning hasn't been lost to history

Vargol avatar Nov 10 '22 15:11 Vargol

I am on M1 and managed to go around this in the following way: I installed a rust compiler using brew, and then initialized it. brew install rustup rustup-init Then I restarted the console and checked if it is installed: rustc --version . It turned out you also have to setup the path: export PATH="$HOME/.cargo/bin:$PATH"

I used this way and it worked for me on M2. Thank you so much

xuunnis123 avatar Nov 13 '22 04:11 xuunnis123

For windows:

  • Install Visual Studio (latest version, 2022)
  • install Python workloads
  • install Desktop Development C++ Workloads

cpietsch avatar Dec 04 '22 18:12 cpietsch