hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

Unable to install hdbscan on colab.

Open Raingel opened this issue 1 year ago • 69 comments

Today I found the following error message when trying to install hdbscan on colab.

 error: subprocess-exited-with-error
  
  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for hdbscan (pyproject.toml) ... error
  ERROR: Failed building wheel for hdbscan
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects

It worked fine when I installed it last week.

I also tried to install the previous version of hdbscan (0.8.29), but it still failed.

image

Raingel avatar Jul 17 '23 15:07 Raingel

Seeing this on our CI builds now as well

error: subprocess-exited-with-error
  
  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [[16](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:17)8 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-38
      creating build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/validity.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/plots.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/prediction.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/hdbscan_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      copying hdbscan/robust_single_linkage_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
      creating build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_rsl.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_prediction_utils.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      copying hdbscan/tests/test_hdbscan.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
      running build_ext
      Compiling hdbscan/_hdbscan_tree.pyx because it changed.
      [1/1] Cythonizing hdbscan/_hdbscan_tree.pyx
      building 'hdbscan._hdbscan_tree' extension
      creating build/temp.linux-x86_64-cpython-38
      creating build/temp.linux-x86_64-cpython-38/hdbscan
      gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/include -I/opt/hostedtoolcache/Python/3.8.[17](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:18)/x64/include/python3.8 -I/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include -c hdbscan/_hdbscan_tree.c -o build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o
      In file included from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:[18](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:19)30,
                       from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                       from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                       from hdbscan/_hdbscan_tree.c:1097:
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
         17 | #warning "Using deprecated NumPy API, disable it with " \
            |  ^~~~~~~
      gcc -shared -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o -L/opt/hostedtoolcache/Python/3.8.17/x64/lib -o build/lib.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.cpython-38-x86_64-linux-gnu.so
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_tree.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_linkage.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      import numpy as np
      cimport numpy as np
      
      from libc.float cimport DBL_MAX
      
      from dist_metrics cimport DistanceMetric
      ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics.pxd' not found
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      import numpy as np
      cimport numpy as np
      
      from libc.float cimport DBL_MAX
      
      from dist_metrics cimport DistanceMetric
      ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics/DistanceMetric.pxd' not found
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      
      
      cpdef np.ndarray[np.double_t, ndim=2] mst_linkage_core_vector(
              np.ndarray[np.double_t, ndim=2, mode='c'] raw_data,
              np.ndarray[np.double_t, ndim=1, mode='c'] core_distances,
              DistanceMetric dist_metric,
              ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:58:8: 'DistanceMetric' is not a type identifier
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                      continue
      
                  right_value = current_distances[j]
                  right_source = current_sources[j]
      
                  left_value = dist_metric.dist(&raw_data_ptr[num_features *
                                                ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:129:42: Cannot convert 'double_t *' to Python object
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
                  right_value = current_distances[j]
                  right_source = current_sources[j]
      
                  left_value = dist_metric.dist(&raw_data_ptr[num_features *
                                                              current_node],
                                                &raw_data_ptr[num_features * j],
                                                ^
      ------------------------------------------------------------
      
      hdbscan/_hdbscan_linkage.pyx:131:42: Cannot convert 'double_t *' to Python object
      Compiling hdbscan/_hdbscan_linkage.pyx because it changed.
      [1/1] Cythonizing hdbscan/_hdbscan_linkage.pyx
      Traceback (most recent call last):
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 416, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 96, in <module>
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line [20](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:21)1, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 26, in run
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 1[22](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:23), in build_extension
          new_ext = cythonize(
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1134, in cythonize
          cythonize_one(*args)
        File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1[30](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:31)1, in cythonize_one
          raise CompileError(None, pyx_file)
      Cython.Compiler.Errors.CompileError: hdbscan/_hdbscan_linkage.pyx
      [end of output]

mikeldking avatar Jul 17 '23 16:07 mikeldking

We're seeming the same issue since today, on linux x86-64 py3.10. I noticed there weren't any wheels before, so I'm assuming we were building hdbscan from source before. Not sure what change now causes the build failure.

fvdnabee avatar Jul 17 '23 16:07 fvdnabee

Having this problem as well. Installing using poetry. No changes to lock file. Was working last week.

MrBeeMovie avatar Jul 17 '23 16:07 MrBeeMovie

This is also creating issue in Databricks as well. Cython released a new major version (3.0.0) a few hours ago, so there might be an issue with that on these managed enviroments. https://pypi.org/project/Cython/#history I tried installing the package from master on wsl and it worked with all python versions > 3.8 using the newest cython. Anyway, it might be worth to pin all requirements to be less than the next major version just to be on the safe side.

EDIT: databricks runtime 10.4 LTS has issues, 11.3 LTS and 12.2 LTS work fine.

Rhaedonius avatar Jul 17 '23 16:07 Rhaedonius

Downgrading Cython to previous release is not working for me. Still same error.

kikefdezl avatar Jul 17 '23 17:07 kikefdezl

Same, colab doesn't have cython3 for me anyways Screenshot 2023-07-17 at 11 10 54 AM

mikeldking avatar Jul 17 '23 17:07 mikeldking

I suggested cython only because of the timing of them releasing a new major version and the errors popping up. it might not be related.

Rhaedonius avatar Jul 17 '23 17:07 Rhaedonius

Downgrading Cython to 0.29.36 is also not working for me.

argonaut76 avatar Jul 17 '23 17:07 argonaut76

Having the same issue on Kaggle notebooks.

dafajon avatar Jul 17 '23 17:07 dafajon

There was a recent sklearn release that changed some internals the hdbscan relied on (which resulted in the 0.8.30 release to attempt to fix those). It's possible that this is the issue; Can you check what sklearn version you have?

lmcinnes avatar Jul 17 '23 18:07 lmcinnes

scikit-learn==1.2.2

argonaut76 avatar Jul 17 '23 18:07 argonaut76

Same issue on ubuntu 18.04 using docker image python:3.8.12

kenho211 avatar Jul 17 '23 18:07 kenho211

I'm at a bit of a loss; especially if 0.8.29 is also not building anymore. I can at least reproduce this locally, but it is unclear how to fix things since nothing that is currently breaking has changed in quite some time -- so it isn't clear why it is breaking at all.

lmcinnes avatar Jul 17 '23 18:07 lmcinnes

Okay, I poked the obvious things in terms of module name resolution issues and it seems to have fixed the problem locally. I don't understand what changed, or, indeed, why this particular change is now required, but given the scale of issues people are having I'm going to push those changes out as a 0.8.31 release and hopefully that solves the problems for some people.

lmcinnes avatar Jul 17 '23 18:07 lmcinnes

I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of Cython (regardless of what's installed in my environment).

cython>=0.27 should be updated to be cython>=0.27<3 to prevent latest version of Cython

(comment is being updated as I'm testing my hypothesis...)

nchepanov avatar Jul 17 '23 18:07 nchepanov

The new patch kindof solved the issue for me. https://github.com/scikit-learn-contrib/hdbscan/releases/tag/0.8.31

thomasjv799 avatar Jul 17 '23 18:07 thomasjv799

Confirming working on 0.8.31 for me too - which makes me believe @nchepanov 's comment makes sense (e.g. the general release of cython 3.0 caused the break). Aligns timing wise too.

mikeldking avatar Jul 17 '23 19:07 mikeldking

@nchepanov I believe you are correct; while the changes made allowed Cython 3 to build hdbscan, there seem to be further issues at runtime. Until I have time to figure out and work through all the changes that Cython 3 requires I have added a "<3" requirement for Cython. That seems to resolve all the issues as far as I can tell. I've pushed that out as 0.8.32 and hopefully that can keep things afloat for a while.

Thanks to everyone for flagging the issue and the help tracking down the source of the problem.

lmcinnes avatar Jul 17 '23 19:07 lmcinnes

I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of Cython (regardless of what's installed in my environment).

cython>=0.27 should be updated to be cython>=0.27<3 to prevent latest version of Cython

(comment is being updated as I'm testing my hypothesis...)

This is more what i was thinking, I did remember something about isolated builds but could not locate it in the python docs. I changed the build requirement to "cython<3" in pyproject.toml and managed to build the code for hdbscan 0.8.30 under databricks 10.4 LTS and colab. The cython in requirements might not be needed, as it's not a runtime requirement (still testing for this)

Rhaedonius avatar Jul 17 '23 19:07 Rhaedonius

I did remember something about isolated builds but could not locate it in the python docs.

pip does not respect installed versions of packages in build-system.requires for PEP517 packages - yeah you can also work around this by installing a working version of cython (2.x), and passing --no-build-isolation to pip install, which will stop it from installing a newer version of cython (3.x) just for the wheel build

aaron-skydio avatar Jul 17 '23 19:07 aaron-skydio

0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:

`Traceback (most recent call last):

File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels(

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)

File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

I'm using scikit-learn==1.2.2.

argonaut76 avatar Jul 17 '23 19:07 argonaut76

0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:

`Traceback (most recent call last):

File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs)

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels(

File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)

File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree

TypeError: 'numpy.float64' object cannot be interpreted as an integer`

I'm using scikit-learn==1.2.2.

This is alsow unfortunately the same runtime exception I'm hitting with 0.83.32

mikeldking avatar Jul 17 '23 19:07 mikeldking

So I definitely saw that runtime error with 0.8.31; in testing that disappeared with 0.8.32. If it is still an issue in 0.8.32 then that's not so good. I was getting all green on the test suite: https://dev.azure.com/lelandmcinnes/HDBSCAN%20builds/_build/results?buildId=901&view=results so I'm not sure what the lingering issue is. Perhaps a clean re-install for 0.8.32?

lmcinnes avatar Jul 17 '23 19:07 lmcinnes

My application is running in docker. I did a clean rebuild and am still getting the error

File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree TypeError: 'numpy.float64' object cannot be interpreted as an integer

I also tried using the most recent scikit-learn release, to no effect.

argonaut76 avatar Jul 17 '23 19:07 argonaut76

@argonaut76 I'm loathe to just pushing new releases; 0.8.32 seems good on a swathe of platforms. I've pushed some changes to master, however, that may fix your problems. Can you install from github within your docker build and see if those resolve your issues?

lmcinnes avatar Jul 17 '23 20:07 lmcinnes

@lmcinnes sure, I'll give that a shot.

If it helps, here's some additional context:

As a base docker image I'm using ubuntu:latest, which is Ubuntu 22.04. I have the following dependencies installed that could affect hdbscan:

Cython==0.29.36 hdbscan==0.8.32 joblib==1.3.1 numpy==1.25.1 scikit-learn==1.3.0 scipy==1.11.1 spacy==3.5.3

argonaut76 avatar Jul 17 '23 20:07 argonaut76

@lmcinnes that worked!

argonaut76 avatar Jul 17 '23 20:07 argonaut76

I'll give it a little while to ensure that the current master works for most people, and then try to push out a 0.8.33 that will hopefully get us over this little hurdle late in the week.

lmcinnes avatar Jul 17 '23 20:07 lmcinnes

Thanks. What an odd problem.

argonaut76 avatar Jul 17 '23 20:07 argonaut76

@lmcinnes : Also hitting the same issue with numpy.float64 as @argonaut76.

If possible for you, would love a v0.8.33 release soon instead of later in the week!

setu4993 avatar Jul 17 '23 20:07 setu4993