tokenizers
tokenizers copied to clipboard
0.11.5 and 0.11.6 packages not compatible with manylinux2010
The binary wheel packages of 0.11.5 and 0.11.6 (e.g. tokenizers-0.11.5-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
) are not compatible with the manylinux2010 policy.
The problem is that tokenizers.cpython-39-x86_64-linux-gnu.so
has a dependency on libz.so.1, which is not included in manylinux2010 nor manylinux2014.
Is it possible to fix the packages so we can deploy tokenizers on our minimal systems that don't have libz.so.1?
Hi @vgod-dbx ,
Thanks for sharing this.
The library is built automatically on manylinux2010
as done here: https://github.com/huggingface/tokenizers/blob/master/.github/workflows/python-release.yml#L18
Script for building is here : https://github.com/huggingface/tokenizers/blob/master/bindings/python/build-wheels.sh
The only thing I see that could bring in libz
is openssl-dev
right ? Though I don't see anything here: https://wiki.openssl.org/index.php/Compilation_and_Installation
Is there anything in there that might explain where libz
gets pulled in the building step ?
Also if you have any suggestions on steps we could take to ensure we don't break that in the future it would be nice.
Hi Narsil,
I did some investigation and found out the issue came from auditwheel
. They added libz.so.1 into their policy whitelist a few months ago, which was a wrong thing to do and unfortunately broke the manylinux
policies.
A workaround now is to use an older version of auditwheel
(<= 4.0.0). This docker image quay.io/pypa/manylinux2010_x86_64@sha256:d02389c35c7e9b6c51a21d71e9e98e6fd57c1225f7a782b3965d5a1e1b3585d4
has auditwheel 4.0.0
, which should fix the compatibility issue.
Is it possible for you to use this image for building your wheels until they fix this issue? I'll report this bug back to them and hope they can fix it.
Hi @vgod-dbx ,
Thank you very much for this investigation ! Saved me lots of time for sure.
It seems like the merge happened quite a while ago (15sep 2021). Since we're using manylinux2010_x86_64:latest
I would rather if we kept it that way just so we're up to date at least security wise when building.
If upstream doesn't respond we'll see what we can do.
Also , relooking at the PEP you linked, it seems that it's now being superseeded with PEP600 https://www.python.org/dev/peps/pep-0600/ which clearly seem to indicate that the fixed list of dependencies was sort of dropped. I am by no means expert on this subject so I'll just follow discussions here and upstream
Thanks Narsil. I understand that you want to keep the builder image up to date. I'll wait for auditwheel's the response and build our own wheel package in the meantime.
Also , relooking at the PEP you linked, it seems that it's now being superseeded with PEP600
Indeed. cross-ref auditwheel answer
I would rather if we kept it that way just so we're up to date at least security wise when building.
You won't get any security update using an end-of-life distro: as mentioned in manylinux readme, manylinux2010 image is based on CentOS 6 which reached EOL on November 30th, 2020. Grafting OpenSSL from an EOL distribution still seems like an issue here. If you want to graft OpenSSL rather than rebuilding it yourself, you're probably better off security wise going with manylinux2014 which is based on CentOS 7 and will reach EOL on June 30th, 2024.
Updating to manylinux2014 has implications on the version of pip
required to install such tagged wheels.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.