[Bug] pip install data-prep-toolkit-transforms[all]==0.2.2 gets error
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Other
What happened + What you expected to happen
I tried to create a venv and install the wheel of all transforms and got an error when installing fasttext, which seems to require wheel.
Collecting data-prep-toolkit-transforms==0.2.2 (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading data_prep_toolkit_transforms-0.2.2-1-py3-none-any.whl.metadata (10 kB)
Collecting data-prep-toolkit>=0.2.2 (from data-prep-toolkit-transforms==0.2.2->data-prep-toolkit-transforms[all]==0.2.2)
Downloading data_prep_toolkit-0.2.2-py3-none-any.whl.metadata (2.2 kB)
Collecting bs4==0.0.2 (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting transformers==4.38.2 (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading transformers-4.38.2-py3-none-any.whl.metadata (130 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.7/130.7 kB 4.9 MB/s eta 0:00:00
Collecting parameterized (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading parameterized-0.9.0-py2.py3-none-any.whl.metadata (18 kB)
Collecting pandas (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading pandas-2.2.3-cp311-cp311-macosx_11_0_arm64.whl.metadata (89 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.9/89.9 kB 33.9 MB/s eta 0:00:00
Collecting docling-core==2.3.0 (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading docling_core-2.3.0-py3-none-any.whl.metadata (5.4 kB)
Collecting pydantic<2.10.0,>=2.0.0 (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 kB 41.0 MB/s eta 0:00:00
Collecting llama-index-core<0.12.0,>=0.11.22 (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading llama_index_core-0.11.23-py3-none-any.whl.metadata (2.5 kB)
Collecting fasttext==0.9.2 (from data-prep-toolkit-transforms[all]==0.2.2)
Downloading fasttext-0.9.2.tar.gz (68 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 308.0 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [28 lines of output]
/Users/dawood/dpk/venv/bin/python3.11: No module named pip
Traceback (most recent call last):
File "<string>", line 38, in __init__
ModuleNotFoundError: No module named 'pybind11'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/dawood/dpk/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/Users/dawood/dpk/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dawood/dpk/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
self.run_setup()
File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 522, in run_setup
super().run_setup(setup_script=setup_script)
File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 320, in run_setup
exec(code, locals())
File "<string>", line 72, in <module>
File "<string>", line 41, in __init__
RuntimeError: pybind11 install failed.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Reproduction script
rm -rf venv
python -m venv venv
source venv/bin/activate
pip install --no-cache-dir data-prep-toolkit-transforms[all]==0.2.2
adding
pip install wheel
before the above pip install fixes it. A solution may be to add wheel as a dependency in language/lang_id which is requiring fasttext which seems to be the source of this problem.
Anything else
No response
OS
MacOS (limited support)
Python
3.11.x
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
I have seen this error when c/c++ toolchains are not present.
If we use conda, these commands will install the required libraries.
conda install gcc_linux-64
conda install gxx_linux-64
@daw3rd Please submit a PR with the recommended addition of pip install wheel to the README file AFTER the release of 1.0.0. Thanks.
Sadly pip install wheel is only a fix on mac. This did not work on redhat, for example.
OK, let's capture the work-around that Burn has found for RH (on CCC):
I found a suggestion to pip install fasttext-wheel which succeeded but the data-prep-kit install still tried to build it. I noticed that the env site-packages dir had both fasttext & fasttext-wheel-0.9.2.dist-info and when I changed that to fasttext-0.9.2.dist-info the data-prep-kit install worked!!
But I think using conda to install gcc v11 is a somewhat better solution:
conda install 'gcc_linux-64<12'
@shahrokhDaijavad Does this need to called out in readme ?
@agoyal26 We are already mentioning what has to be done with fasttext (the root cause of this issue) in 2 places:
For Linux machines: https://github.com/IBM/data-prep-kit/blob/dev/doc/quick-start/quick-start.md For Windows machines: https://github.com/IBM/data-prep-kit/blob/dev/doc/quick-start/quick-start.md#running-transforms-on-windows
@touma-I Can we consider this bug fixed and close it?