tokenizers
tokenizers copied to clipboard
ERROR: Failed building wheel for tokenizers
System Info
I can't seem to get past this error "ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects" when installing transformers with pip. An ML friend of mine also tried on their own instance and encountered the same problem, tried to help troubleshoot with me and we weren't able to move past so I think its possibly a recent issue.
I am following the transformers README install instructions step by step, with a venv and pytorch ready to go. Pip is also fully up to date. In this error output one prompt it says is to possibly install a rust compiler - but we both felt this doesn't seem like the right next step because it usually isn't required when installing the transformers package and the README has no mention of needing to install a rust compiler.
Thanks in advance!
-Blake
Full output below:
command: pip install transformers
Collecting transformers Using cached transformers-4.21.1-py3-none-any.whl (4.7 MB) Requirement already satisfied: tqdm>=4.27 in ./venv/lib/python3.9/site-packages (from transformers) (4.64.0) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./venv/lib/python3.9/site-packages (from transformers) (0.9.0) Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.9/site-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in ./venv/lib/python3.9/site-packages (from transformers) (2022.8.17) Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 Using cached tokenizers-0.12.1.tar.gz (220 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.17 in ./venv/lib/python3.9/site-packages (from transformers) (1.23.2) Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.9/site-packages (from transformers) (21.3) Requirement already satisfied: filelock in ./venv/lib/python3.9/site-packages (from transformers) (3.8.0) Requirement already satisfied: requests in ./venv/lib/python3.9/site-packages (from transformers) (2.26.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./venv/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (4.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ./venv/lib/python3.9/site-packages (from packaging>=20.0->transformers) (3.0.9) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./venv/lib/python3.9/site-packages (from requests->transformers) (1.26.7) Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.9/site-packages (from requests->transformers) (3.3) Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2.0.7) Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... error error: subprocess-exited-with-error
× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [51 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.macosx-12-arm64-cpython-39 creating build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/models/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models creating build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/decoders/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders creating build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/normalizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/pre_tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/processors/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors creating build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/trainers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations creating build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/visualizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/models/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/decoders/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/normalizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/pre_tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/processors/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/trainers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools running build_ext running build_rust error: can't find Rust compiler
If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
To update pip, run:
pip install --upgrade pip
and then retry package installation.
If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
command: pip install transformers
Collecting transformers Using cached transformers-4.21.1-py3-none-any.whl (4.7 MB) Requirement already satisfied: tqdm>=4.27 in ./venv/lib/python3.9/site-packages (from transformers) (4.64.0) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./venv/lib/python3.9/site-packages (from transformers) (0.9.0) Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.9/site-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in ./venv/lib/python3.9/site-packages (from transformers) (2022.8.17) Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 Using cached tokenizers-0.12.1.tar.gz (220 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.17 in ./venv/lib/python3.9/site-packages (from transformers) (1.23.2) Requirement already satisfied: packaging>=20.0 in ./venv/lib/python3.9/site-packages (from transformers) (21.3) Requirement already satisfied: filelock in ./venv/lib/python3.9/site-packages (from transformers) (3.8.0) Requirement already satisfied: requests in ./venv/lib/python3.9/site-packages (from transformers) (2.26.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./venv/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (4.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ./venv/lib/python3.9/site-packages (from packaging>=20.0->transformers) (3.0.9) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./venv/lib/python3.9/site-packages (from requests->transformers) (1.26.7) Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.9/site-packages (from requests->transformers) (3.3) Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in ./venv/lib/python3.9/site-packages (from requests->transformers) (2.0.7) Building wheels for collected packages: tokenizers Building wheel for tokenizers (pyproject.toml) ... error error: subprocess-exited-with-error
× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [51 lines of output] running bdist_wheel running build running build_py creating build creating build/lib.macosx-12-arm64-cpython-39 creating build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/models/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models creating build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/decoders/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders creating build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/normalizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/pre_tokenizers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/processors/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors creating build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/trainers/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers creating build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/implementations creating build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/init.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/tools/visualizer.py -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools copying py_src/tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers copying py_src/tokenizers/models/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/models copying py_src/tokenizers/decoders/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/decoders copying py_src/tokenizers/normalizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/normalizers copying py_src/tokenizers/pre_tokenizers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/pre_tokenizers copying py_src/tokenizers/processors/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/processors copying py_src/tokenizers/trainers/init.pyi -> build/lib.macosx-12-arm64-cpython-39/tokenizers/trainers copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.macosx-12-arm64-cpython-39/tokenizers/tools running build_ext running build_rust error: can't find Rust compiler
If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
To update pip, run:
pip install --upgrade pip
and then retry package installation.
If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Expected behavior
I would expect transformers library to install without throwing an error when all pre-requisites for installation are met.
I am aware of this past issue - it is very similar but these suggested fixes seem dated and are not working.
Let me move this over to tokenizers, which should be in a better position to help.
also having this issue, hadn't ran into it before.
Are you guys on M1 ? If that's the case it's expected unfortunately. (https://github.com/huggingface/tokenizers/issues/932)
If not, what platform are you on ? (OS, hardware, python version ?)
Basically for M1 you need to install from source (for now, fixes coming soon https://github.com/huggingface/tokenizers/pull/1055).
Also the error message says you're missing a rust compiler, it might be enough to just install the rust compiler: https://www.rust-lang.org/tools/install and maybe the install will go through. (It's easier if we prebuild those but still).
I'm using M2 Apple and I can't install tokenizers. The same thing works on my Linux fine. How can we install tokenizers for M1/M2 Apple Macs?
M1 user here. I got the same error, installing the rust compiler fixed this for me.
It's all the same, we couldn't prebuild the library for m1 (which is an arm64
chip) because github didn't have a arm64 action runner. We did push manually some prebuilt binaries but it seems they contained some issues. Since then, github enabled the runner to work on m1 machines (so all macos+arm64) so hopefully this will be fixed for the next release.
Since this is a "major" release (still not in 1.0) we're going to do a full sweep of slow tests in transformers (which is our biggest user) and hopefully this should come out of the box for m1 onwards after that !
@stephantul where did you get the rust compiler. I installed it from https://www.rust-lang.org/tools/install and pip3 install tokenizers
still fails.
@alibrahimzada I installed it with homebrew
@alibrahimzada you might also need pip install setuptools_rust
and your python environment needs to be shareable (depends how you installed python basically, for pyenv
for instance you will need this: https://github.com/pyenv/pyenv/issues/392
(Careful it's now PYTHON_CONFIGURE_OPT="--enable-shared" pyenv install ...
).
Having the same problem and none of the above suggestions worked. Any ETA on when we can expect the next release that fixes this bug?
I am on M1 and managed to go around this in the following way:
I installed a rust compiler using brew, and then initialized it.
brew install rustup
rustup-init
Then I restarted the console and checked if it is installed: rustc --version
. It turned out you also have to setup the path:
export PATH="$HOME/.cargo/bin:$PATH"
I have done everything @Narsil and @anibzlv have suggested. No luck still.. (am on M1, 2021)
Oddly enough, the library works just fine inside a virtual environment on my MBP with the M1 chip. So for now, that's my approach.
I could install from sources and it seems to be working.
Has anyone tried to install the latest version on M1 ? The prebuilt binaries should be released now !
I tried installing on M1 just now in a python3.10 virtual environment. All I had to do was pip install setuptools_rust
. Then I could install all the required packages.
I'm running on M2 with Python3.8 and are still running into this problem Any other workaround than installing from source?
I thought Python 3.8 was not built for M1/M2... So this library cannot build it for you.
Are you sure you are not in compatibility mode and not really using 3.8 ? https://stackoverflow.com/questions/69511006/cant-install-pyenv-3-8-5-on-macos-big-sur-with-m1-chip
Try telling pip to use prefer binary, it'll probably give you an older version of tokenizer but you would need to build from source. It does depend on the version requirements for tokenizer.
The proper fix would be for Hugginface to create wheels for Apple Silicon
We already build wheels for Apple Silicon ! Just not python3.8 which isn't supposed to exist on M1. (only 3.9, 3.10, and 3.11 now)
Where's the binary wheel for 0.12.1 , PyPi can't find it. Having to use 11.6 to avoid having "install rust" as an instruction to install user software.
Github did not provide an action runner at the time for M1, so builds where manual (and infrequent).
Any reason you cannot upgrade to 0.13.2
or 0.12.6
?
But yes for some older versions the M1 are not present, we're not doing retroactive builds unfortunately. I'm basically the sole maintainer here, and I don't really have the time to figure out all the old versions for all platforms (but ensuring that once a platform is supported it keeps on working is something we're committed to).
In the project there are a number of other thirty party python modules dependant on tokenizer, from yesterdays build I got the following version dependencies for pip
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1 .
No sure why its not picked in 0.12.6, setting pip to prefer binary installed 0.11.6.
EDIT: answering my own question https://pypi.org/simple/tokenizers/ goes straight from 0.12.1 to 0.13.0 there is no 0.12.6
Hmm interesting, could you try force installing 0.12.6 and see if that fixes it ?
If you could share your env (Python version + hardware (m1 I guess) + requirements.txt) ?
I don't remember the command but there's a way to make pip explain its decisions regarding versions.
I got confused with 0.11.6 sorry !
And I don't see the builds for 0.12 for arm, I'm guessing we moved to 0.13 first.
TBH there "shouldn't" by any major differences between 0.12.1 and 0.13, so if you could switch that might work (I took caution since we updated PyO3 bindings version and that triggered a lot of code changes, even if we didn't intend any functional changes).
transformers
is the one probably limiting tokenizers
(we do that to enable tokenizers
to make eventual breaking changes).
Maybe you could try updating it ?
It a bit convoluted ATM as currently on different OS's require different version of gfpgan unless you install torch upfront.
So I do
pip install "torch<1.13" "torchvision<1.14"
Main requirements.txt -r requirements-base.txt
protobuf==3.19.6 torch<1.13.0 torchvision<0.14.0 -e .
requirements-base.txt
pip will resolve the version which matches torch
albumentations dependency_injector==4.40.0 diffusers einops eventlet flask==2.1.3 flask_cors==3.0.10 flask_socketio==5.3.0 flaskwebgui==0.3.7 getpass_asterisk gfpgan huggingface-hub imageio imageio-ffmpeg kornia numpy omegaconf opencv-python pillow pip>=22 pudb pyreadline3 pytorch-lightning==1.7.7 realesrgan scikit-image>=0.19 send2trash streamlit taming-transformers-rom1504 test-tube torch-fidelity torchmetrics transformers==4.21.* git+https://github.com/openai/CLIP.git@main#egg=clip git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k-diffusion git+https://github.com/invoke-ai/clipseg.git@models-rename#egg=clipseg
I'll have to see why we limit transformers assuming the reasoning hasn't been lost to history
I am on M1 and managed to go around this in the following way: I installed a rust compiler using brew, and then initialized it.
brew install rustup
rustup-init
Then I restarted the console and checked if it is installed:rustc --version
. It turned out you also have to setup the path:export PATH="$HOME/.cargo/bin:$PATH"
I used this way and it worked for me on M2. Thank you so much
For windows:
- Install Visual Studio (latest version, 2022)
- install Python workloads
- install Desktop Development C++ Workloads