data-prep-kit icon indicating copy to clipboard operation
data-prep-kit copied to clipboard

[Bug] pip install data-prep-toolkit-transforms[all]==0.2.2 gets error

Open daw3rd opened this issue 1 year ago • 6 comments

Search before asking

  • [X] I searched the issues and found no similar issues.

Component

Other

What happened + What you expected to happen

I tried to create a venv and install the wheel of all transforms and got an error when installing fasttext, which seems to require wheel.

Collecting data-prep-toolkit-transforms==0.2.2 (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading data_prep_toolkit_transforms-0.2.2-1-py3-none-any.whl.metadata (10 kB)
Collecting data-prep-toolkit>=0.2.2 (from data-prep-toolkit-transforms==0.2.2->data-prep-toolkit-transforms[all]==0.2.2)
  Downloading data_prep_toolkit-0.2.2-py3-none-any.whl.metadata (2.2 kB)
Collecting bs4==0.0.2 (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting transformers==4.38.2 (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading transformers-4.38.2-py3-none-any.whl.metadata (130 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.7/130.7 kB 4.9 MB/s eta 0:00:00
Collecting parameterized (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading parameterized-0.9.0-py2.py3-none-any.whl.metadata (18 kB)
Collecting pandas (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading pandas-2.2.3-cp311-cp311-macosx_11_0_arm64.whl.metadata (89 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.9/89.9 kB 33.9 MB/s eta 0:00:00
Collecting docling-core==2.3.0 (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading docling_core-2.3.0-py3-none-any.whl.metadata (5.4 kB)
Collecting pydantic<2.10.0,>=2.0.0 (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 kB 41.0 MB/s eta 0:00:00
Collecting llama-index-core<0.12.0,>=0.11.22 (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading llama_index_core-0.11.23-py3-none-any.whl.metadata (2.5 kB)
Collecting fasttext==0.9.2 (from data-prep-toolkit-transforms[all]==0.2.2)
  Downloading fasttext-0.9.2.tar.gz (68 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 308.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [28 lines of output]
      /Users/dawood/dpk/venv/bin/python3.11: No module named pip
      Traceback (most recent call last):
        File "<string>", line 38, in __init__
      ModuleNotFoundError: No module named 'pybind11'
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "/Users/dawood/dpk/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/Users/dawood/dpk/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/dawood/dpk/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
          self.run_setup()
        File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 522, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/private/var/folders/tc/l4tdn6zn1q57q5cw4vqzlgxr0000gn/T/pip-build-env-y3ndi6zd/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 320, in run_setup
          exec(code, locals())
        File "<string>", line 72, in <module>
        File "<string>", line 41, in __init__
      RuntimeError: pybind11 install failed.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Reproduction script

rm -rf venv
python -m venv venv
source venv/bin/activate
pip install --no-cache-dir data-prep-toolkit-transforms[all]==0.2.2

adding pip install wheel before the above pip install fixes it. A solution may be to add wheel as a dependency in language/lang_id which is requiring fasttext which seems to be the source of this problem.

Anything else

No response

OS

MacOS (limited support)

Python

3.11.x

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

daw3rd avatar Dec 12 '24 18:12 daw3rd

I have seen this error when c/c++ toolchains are not present.

If we use conda, these commands will install the required libraries.

conda install gcc_linux-64
conda install gxx_linux-64

shivdeep-singh-ibm avatar Dec 17 '24 13:12 shivdeep-singh-ibm

@daw3rd Please submit a PR with the recommended addition of pip install wheel to the README file AFTER the release of 1.0.0. Thanks.

shahrokhDaijavad avatar Jan 07 '25 16:01 shahrokhDaijavad

Sadly pip install wheel is only a fix on mac. This did not work on redhat, for example.

daw3rd avatar Jan 09 '25 21:01 daw3rd

OK, let's capture the work-around that Burn has found for RH (on CCC):

I found a suggestion to pip install fasttext-wheel which succeeded but the data-prep-kit install still tried to build it. I noticed that the env site-packages dir had both fasttext & fasttext-wheel-0.9.2.dist-info and when I changed that to fasttext-0.9.2.dist-info the data-prep-kit install worked!!

But I think using conda to install gcc v11 is a somewhat better solution: conda install 'gcc_linux-64<12'

shahrokhDaijavad avatar Jan 09 '25 22:01 shahrokhDaijavad

@shahrokhDaijavad Does this need to called out in readme ?

agoyal26 avatar Mar 11 '25 04:03 agoyal26

@agoyal26 We are already mentioning what has to be done with fasttext (the root cause of this issue) in 2 places:

For Linux machines: https://github.com/IBM/data-prep-kit/blob/dev/doc/quick-start/quick-start.md For Windows machines: https://github.com/IBM/data-prep-kit/blob/dev/doc/quick-start/quick-start.md#running-transforms-on-windows

@touma-I Can we consider this bug fixed and close it?

shahrokhDaijavad avatar Mar 11 '25 16:03 shahrokhDaijavad