Top2Vec
Top2Vec copied to clipboard
TypeError: 'numpy.float64' object cannot be interpreted as an integer
Hi there,
When trying to run the example code I encounter the following:
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)
2023-07-20 13:51:37,083 - top2vec - INFO - Pre-processing documents for training
2023-07-20 13:51:48,891 - top2vec - INFO - Creating joint document/word embedding
2023-07-20 14:01:43,811 - top2vec - INFO - Creating lower dimension embedding of documents
2023-07-20 14:02:09,146 - top2vec - INFO - Finding dense areas of documents
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 666, in __init__
self.compute_topics(umap_args=umap_args, hdbscan_args=hdbscan_args, topic_merge_delta=topic_merge_delta)
File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 1266, in compute_topics
cluster = hdbscan.HDBSCAN(**hdbscan_args).fit(umap_model.embedding_)
File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 1205, in fit
) = hdbscan(clean_data, **kwargs)
File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 884, in hdbscan
_tree_to_labels(
File "/Users/thedmitry/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 80, in _tree_to_labels
labels, probabilities, stabilities = get_clusters(
File "hdbscan/_hdbscan_tree.pyx", line 659, in hdbscan._hdbscan_tree.get_clusters
File "hdbscan/_hdbscan_tree.pyx", line 733, in hdbscan._hdbscan_tree.get_clusters
TypeError: 'numpy.float64' object cannot be interpreted as an integer
All of the libraries are updated to the latest versions, but I have tried downgrading lumpy and hdbscan with no result.
I am fairly new to Python and not sure if there's something I am doing wrong here. I did see some discussion of this error on the hdbscan issues page, but their solution there was to upgrade to the most recent version, which did not help in my case.
I am running into the same problem
I have the same issue. All embedding models ran into this error. Using Python 3.10 right now!
So, I switched to a different method, but encountered the same error there. I am using python 3.11, so ymmw, but what helped me was installing older versions of a couple of libraries. Not sure if the second line is required for top2vec.
%pip install --user --no-warn-script-location --disable-pip-version-check Cython==0.29.34 numpy==1.23.5 %pip install --user --no-warn-script-location --disable-pip-version-check --no-build-isolation hdbscan==0.8.29
Folks, I found the problem and a "fix"! Its actually gcc and hdbscan problem which seems to be a dependency for hdbscan. The fix for me is installing VC+++ 2022 and add the C++ Desktop Development package. pip install now works for hdbscan and enables top2vec to run properly. I hope this helps!
For me this did not work. After uninstalling hdbscan and cloning + installing manually it did work. As per https://github.com/scikit-learn-contrib/hdbscan/issues/607
It is indeed a problem with HDBSCAN, related to this issue.
Updating HDBSCAN to 0.8.33 worked for me.