BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

BERTopic on AzureML?

Open jnicholls82 opened this issue 2 years ago • 3 comments

Hello, has anyone successfully got BERTopic running on AzureML?

Environment: Azure ML 3.8

Having installed the BERTopic (pip install BERTopic), I then use the following starter code (from the BERTopic GitHub):

from bertopic import BERTopic from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']

topic_model = BERTopic() topics, probs = topic_model.fit_transform(docs) After running for around 4 minutes, this gives the following error: UFuncTypeError: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types <class 'numpy.dtype[float32]'> -> None

Any support would be gratefully received!

Regards, James

jnicholls82 avatar Aug 14 '22 14:08 jnicholls82

This is the full error stack:

Batches 2022-08-14 14:23:59,573 - BERTopic - Transformed documents to Embeddings 2022-08-14 14:24:11,356 - BERTopic - The dimensionality reduction algorithm did not contain the y parameter and therefore the y parameter was not used

UFuncTypeError Traceback (most recent call last) File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/bertopic/_bertopic.py:1391, in BERTopic._reduce_dimensionality(self, embeddings, y) 1390 try: -> 1391 self.umap_model.fit(embeddings, y=y) 1392 except TypeError:

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/umap/umap_.py:2516, in UMAP.fit(self, X, y) 2511 if self.knn_dists is None: 2512 ( 2513 self._knn_indices, 2514 self._knn_dists, 2515 self._knn_search_index, -> 2516 ) = nearest_neighbors( 2517 X[index], 2518 self._n_neighbors, 2519 nn_metric, 2520 self._metric_kwds, 2521 self.angular_rp_forest, 2522 random_state, 2523 self.low_memory, 2524 use_pynndescent=True, 2525 n_jobs=self.n_jobs, 2526 verbose=self.verbose, 2527 ) 2528 else:

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/umap/umap_.py:342, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose) 328 knn_search_index = NNDescent( 329 X, 330 n_neighbors=n_neighbors, (...) 340 compressed=False, 341 ) --> 342 knn_indices, knn_dists = knn_search_index.neighbor_graph 344 if verbose:

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pynndescent/pynndescent_.py:1532, in NNDescent.neighbor_graph(self) 1529 if self._distance_correction is not None: 1530 result = ( 1531 self._neighbor_graph[0].copy(), -> 1532 self._distance_correction(self._neighbor_graph[1]), 1533 ) 1534 else:

UFuncTypeError: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types <class 'numpy.dtype[float32]'> -> None

During handling of the above exception, another exception occurred:

UFuncTypeError Traceback (most recent call last) Input In [3], in 1 topic_model = BERTopic(verbose=True) ----> 2 topics, probs = topic_model.fit_transform(docs)

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/bertopic/_bertopic.py:306, in BERTopic.fit_transform(self, documents, embeddings, y) 304 if self.seed_topic_list is not None and self.embedding_model is not None: 305 y, embeddings = self._guided_topic_modeling(embeddings) --> 306 umap_embeddings = self._reduce_dimensionality(embeddings, y) 308 # Cluster reduced embeddings 309 documents, probabilities = self._cluster_embeddings(umap_embeddings, documents)

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/bertopic/_bertopic.py:1395, in BERTopic._reduce_dimensionality(self, embeddings, y) 1392 except TypeError: 1393 logger.info("The dimensionality reduction algorithm did not contain the y parameter and" 1394 " therefore the y parameter was not used") -> 1395 self.umap_model.fit(embeddings) 1397 umap_embeddings = self.umap_model.transform(embeddings) 1398 logger.info("Reduced dimensionality")

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/umap/umap_.py:2516, in UMAP.fit(self, X, y) 2510 nn_metric = self._input_distance_func 2511 if self.knn_dists is None: 2512 ( 2513 self._knn_indices, 2514 self._knn_dists, 2515 self._knn_search_index, -> 2516 ) = nearest_neighbors( 2517 X[index], 2518 self._n_neighbors, 2519 nn_metric, 2520 self._metric_kwds, 2521 self.angular_rp_forest, 2522 random_state, 2523 self.low_memory, 2524 use_pynndescent=True, 2525 n_jobs=self.n_jobs, 2526 verbose=self.verbose, 2527 ) 2528 else: 2529 self._knn_indices = self.knn_indices

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/umap/umap_.py:342, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose) 326 n_iters = max(5, int(round(np.log2(X.shape[0])))) 328 knn_search_index = NNDescent( 329 X, 330 n_neighbors=n_neighbors, (...) 340 compressed=False, 341 ) --> 342 knn_indices, knn_dists = knn_search_index.neighbor_graph 344 if verbose: 345 print(ts(), "Finished Nearest Neighbor Search")

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/pynndescent/pynndescent_.py:1532, in NNDescent.neighbor_graph(self) 1528 return None 1529 if self._distance_correction is not None: 1530 result = ( 1531 self._neighbor_graph[0].copy(), -> 1532 self._distance_correction(self._neighbor_graph[1]), 1533 ) 1534 else: 1535 result = (self._neighbor_graph[0].copy(), self._neighbor_graph[1].copy())

UFuncTypeError: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types <class 'numpy.dtype[float32]'> -> None

jnicholls82 avatar Aug 14 '22 14:08 jnicholls82

There might be an issue with the packages that were already installed in your environment. It might be worthwhile to do pip install --upgrade bertopic instead to get the most recent packages. Moreover, you can find some solutions to your problem here that you can try out.

MaartenGr avatar Aug 18 '22 10:08 MaartenGr

Not running on AzureML, but had same error, the following fixes worked: https://github.com/lmcinnes/pynndescent/issues/163#issuecomment-1016694682 https://github.com/lmcinnes/pynndescent/issues/163#issuecomment-1025082538

deoxyribose avatar Sep 05 '22 13:09 deoxyribose

Due to inactivity, I'll be closing this for now. If you have any questions or want to continue the discussion, I'll make sure to re-open the issue!

MaartenGr avatar Jan 09 '23 12:01 MaartenGr