hdbscan
hdbscan copied to clipboard
Unable to install hdbscan on colab.
Today I found the following error message when trying to install hdbscan on colab.
error: subprocess-exited-with-error
× Building wheel for hdbscan (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for hdbscan (pyproject.toml) ... error
ERROR: Failed building wheel for hdbscan
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects
It worked fine when I installed it last week.
I also tried to install the previous version of hdbscan (0.8.29), but it still failed.
Seeing this on our CI builds now as well
error: subprocess-exited-with-error
× Building wheel for hdbscan (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [[16](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:17)8 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/hdbscan
copying hdbscan/validity.py -> build/lib.linux-x86_64-cpython-38/hdbscan
copying hdbscan/plots.py -> build/lib.linux-x86_64-cpython-38/hdbscan
copying hdbscan/flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan
copying hdbscan/prediction.py -> build/lib.linux-x86_64-cpython-38/hdbscan
copying hdbscan/hdbscan_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
copying hdbscan/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan
copying hdbscan/robust_single_linkage_.py -> build/lib.linux-x86_64-cpython-38/hdbscan
creating build/lib.linux-x86_64-cpython-38/hdbscan/tests
copying hdbscan/tests/test_rsl.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
copying hdbscan/tests/test_prediction_utils.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
copying hdbscan/tests/test_flat.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
copying hdbscan/tests/__init__.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
copying hdbscan/tests/test_hdbscan.py -> build/lib.linux-x86_64-cpython-38/hdbscan/tests
running build_ext
Compiling hdbscan/_hdbscan_tree.pyx because it changed.
[1/1] Cythonizing hdbscan/_hdbscan_tree.pyx
building 'hdbscan._hdbscan_tree' extension
creating build/temp.linux-x86_64-cpython-38
creating build/temp.linux-x86_64-cpython-38/hdbscan
gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/include -I/opt/hostedtoolcache/Python/3.8.[17](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:18)/x64/include/python3.8 -I/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include -c hdbscan/_hdbscan_tree.c -o build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o
In file included from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:[18](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:19)30,
from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from hdbscan/_hdbscan_tree.c:1097:
/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
gcc -shared -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib -Wl,--rpath=/opt/hostedtoolcache/Python/3.8.17/x64/lib build/temp.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.o -L/opt/hostedtoolcache/Python/3.8.17/x64/lib -o build/lib.linux-x86_64-cpython-38/hdbscan/_hdbscan_tree.cpython-38-x86_64-linux-gnu.so
/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_tree.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-sir9k2dg/hdbscan_aa682700701c41ffa445f31aed278805/hdbscan/_hdbscan_linkage.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Error compiling Cython file:
------------------------------------------------------------
...
import numpy as np
cimport numpy as np
from libc.float cimport DBL_MAX
from dist_metrics cimport DistanceMetric
^
------------------------------------------------------------
hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics.pxd' not found
Error compiling Cython file:
------------------------------------------------------------
...
import numpy as np
cimport numpy as np
from libc.float cimport DBL_MAX
from dist_metrics cimport DistanceMetric
^
------------------------------------------------------------
hdbscan/_hdbscan_linkage.pyx:12:0: 'dist_metrics/DistanceMetric.pxd' not found
Error compiling Cython file:
------------------------------------------------------------
...
cpdef np.ndarray[np.double_t, ndim=2] mst_linkage_core_vector(
np.ndarray[np.double_t, ndim=2, mode='c'] raw_data,
np.ndarray[np.double_t, ndim=1, mode='c'] core_distances,
DistanceMetric dist_metric,
^
------------------------------------------------------------
hdbscan/_hdbscan_linkage.pyx:58:8: 'DistanceMetric' is not a type identifier
Error compiling Cython file:
------------------------------------------------------------
...
continue
right_value = current_distances[j]
right_source = current_sources[j]
left_value = dist_metric.dist(&raw_data_ptr[num_features *
^
------------------------------------------------------------
hdbscan/_hdbscan_linkage.pyx:129:42: Cannot convert 'double_t *' to Python object
Error compiling Cython file:
------------------------------------------------------------
...
right_value = current_distances[j]
right_source = current_sources[j]
left_value = dist_metric.dist(&raw_data_ptr[num_features *
current_node],
&raw_data_ptr[num_features * j],
^
------------------------------------------------------------
hdbscan/_hdbscan_linkage.pyx:131:42: Cannot convert 'double_t *' to Python object
Compiling hdbscan/_hdbscan_linkage.pyx because it changed.
[1/1] Cythonizing hdbscan/_hdbscan_linkage.pyx
Traceback (most recent call last):
File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/runner/.local/share/hatch/env/virtual/arize-phoenix/C8K4HrkP/type/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 416, in build_wheel
return self._build_with_temp_dir(['bdist_wheel'], '.whl',
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
self.run_setup()
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 487, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "<string>", line 96, in <module>
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line [20](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:21)1, in run_commands
dist.run_commands()
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 343, in run
self.run_command("build")
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "<string>", line 26, in run
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 1[22](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:23), in build_extension
new_ext = cythonize(
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1134, in cythonize
cythonize_one(*args)
File "/tmp/pip-build-env-0_kdszx7/overlay/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1[30](https://github.com/Arize-ai/phoenix/actions/runs/5577666975/jobs/10190745313?pr=917#step:6:31)1, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: hdbscan/_hdbscan_linkage.pyx
[end of output]
We're seeming the same issue since today, on linux x86-64 py3.10. I noticed there weren't any wheels before, so I'm assuming we were building hdbscan from source before. Not sure what change now causes the build failure.
Having this problem as well. Installing using poetry. No changes to lock file. Was working last week.
This is also creating issue in Databricks as well. Cython released a new major version (3.0.0) a few hours ago, so there might be an issue with that on these managed enviroments. https://pypi.org/project/Cython/#history I tried installing the package from master on wsl and it worked with all python versions > 3.8 using the newest cython. Anyway, it might be worth to pin all requirements to be less than the next major version just to be on the safe side.
EDIT: databricks runtime 10.4 LTS has issues, 11.3 LTS and 12.2 LTS work fine.
Downgrading Cython to previous release is not working for me. Still same error.
Same, colab doesn't have cython3 for me anyways
I suggested cython only because of the timing of them releasing a new major version and the errors popping up. it might not be related.
Downgrading Cython to 0.29.36 is also not working for me.
Having the same issue on Kaggle notebooks.
There was a recent sklearn release that changed some internals the hdbscan relied on (which resulted in the 0.8.30 release to attempt to fix those). It's possible that this is the issue; Can you check what sklearn version you have?
scikit-learn==1.2.2
Same issue on ubuntu 18.04 using docker image python:3.8.12
I'm at a bit of a loss; especially if 0.8.29 is also not building anymore. I can at least reproduce this locally, but it is unclear how to fix things since nothing that is currently breaking has changed in quite some time -- so it isn't clear why it is breaking at all.
Okay, I poked the obvious things in terms of module name resolution issues and it seems to have fixed the problem locally. I don't understand what changed, or, indeed, why this particular change is now required, but given the scale of issues people are having I'm going to push those changes out as a 0.8.31 release and hopefully that solves the problems for some people.
I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of Cython
(regardless of what's installed in my environment).
cython>=0.27
should be updated to be cython>=0.27<3
to prevent latest version of Cython
(comment is being updated as I'm testing my hypothesis...)
The new patch kindof solved the issue for me. https://github.com/scikit-learn-contrib/hdbscan/releases/tag/0.8.31
Confirming working on 0.8.31
for me too - which makes me believe @nchepanov 's comment makes sense (e.g. the general release of cython 3.0 caused the break). Aligns timing wise too.
@nchepanov I believe you are correct; while the changes made allowed Cython 3 to build hdbscan, there seem to be further issues at runtime. Until I have time to figure out and work through all the changes that Cython 3 requires I have added a "<3" requirement for Cython. That seems to resolve all the issues as far as I can tell. I've pushed that out as 0.8.32 and hopefully that can keep things afloat for a while.
Thanks to everyone for flagging the issue and the help tracking down the source of the problem.
I have an idea. This might be caused by isolated builds. When I install the package it pulls down the most recent version of
Cython
(regardless of what's installed in my environment).
cython>=0.27
should be updated to becython>=0.27<3
to prevent latest version of Cython(comment is being updated as I'm testing my hypothesis...)
This is more what i was thinking, I did remember something about isolated builds but could not locate it in the python docs. I changed the build requirement to "cython<3" in pyproject.toml and managed to build the code for hdbscan 0.8.30 under databricks 10.4 LTS and colab. The cython in requirements might not be needed, as it's not a runtime requirement (still testing for this)
I did remember something about isolated builds but could not locate it in the python docs.
pip does not respect installed versions of packages in build-system.requires
for PEP517 packages - yeah you can also work around this by installing a working version of cython (2.x), and passing --no-build-isolation
to pip install
, which will stop it from installing a newer version of cython (3.x) just for the wheel build
0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:
`Traceback (most recent call last):
File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)
File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels(
File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)
File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree
File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree
TypeError: 'numpy.float64' object cannot be interpreted as an integer`
I'm using scikit-learn==1.2.2.
0.83.31 is not working for me. I'm running hdbscan inside a dockerized application, and getting the following error:
`Traceback (most recent call last):
File "/usr/src/app/modules/cluster.py", line 26, in fit clusterer = HDBSCAN(min_cluster_size=min_cluster_size, min_samples=self.min_samples, cluster_selection_method=self.cluster_selection_method).fit(vectors)
File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 1205, in fit ) = hdbscan(clean_data, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 884, in hdbscan _tree_to_labels(
File "/usr/local/lib/python3.10/dist-packages/hdbscan/hdbscan_.py", line 78, in _tree_to_labels condensed_tree = condense_tree(single_linkage_tree, min_cluster_size)
File "hdbscan/_hdbscan_tree.pyx", line 43, in hdbscan._hdbscan_tree.condense_tree
File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree
TypeError: 'numpy.float64' object cannot be interpreted as an integer`
I'm using scikit-learn==1.2.2.
This is alsow unfortunately the same runtime exception I'm hitting with 0.83.32
So I definitely saw that runtime error with 0.8.31; in testing that disappeared with 0.8.32. If it is still an issue in 0.8.32 then that's not so good. I was getting all green on the test suite: https://dev.azure.com/lelandmcinnes/HDBSCAN%20builds/_build/results?buildId=901&view=results so I'm not sure what the lingering issue is. Perhaps a clean re-install for 0.8.32?
My application is running in docker. I did a clean rebuild and am still getting the error
File "hdbscan/_hdbscan_tree.pyx", line 114, in hdbscan._hdbscan_tree.condense_tree TypeError: 'numpy.float64' object cannot be interpreted as an integer
I also tried using the most recent scikit-learn release, to no effect.
@argonaut76 I'm loathe to just pushing new releases; 0.8.32 seems good on a swathe of platforms. I've pushed some changes to master, however, that may fix your problems. Can you install from github within your docker build and see if those resolve your issues?
@lmcinnes sure, I'll give that a shot.
If it helps, here's some additional context:
As a base docker image I'm using ubuntu:latest, which is Ubuntu 22.04. I have the following dependencies installed that could affect hdbscan:
Cython==0.29.36 hdbscan==0.8.32 joblib==1.3.1 numpy==1.25.1 scikit-learn==1.3.0 scipy==1.11.1 spacy==3.5.3
@lmcinnes that worked!
I'll give it a little while to ensure that the current master works for most people, and then try to push out a 0.8.33 that will hopefully get us over this little hurdle late in the week.
Thanks. What an odd problem.
@lmcinnes : Also hitting the same issue with numpy.float64
as @argonaut76.
If possible for you, would love a v0.8.33 release soon instead of later in the week!