BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

GPU error

Open research2023 opened this issue 3 years ago • 16 comments


TypeError Traceback (most recent call last) in ----> 1 from bertopic import BERTopic 2 from cuml.cluster import HDBSCAN 3 from cuml.manifold import UMAP 4 # Create instances of GPU-accelerated UMAP and HDBSCAN 5 umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)

3 frames /usr/local/lib/python3.7/dist-packages/hdbscan/hdbscan_.py in 507 leaf_size=40, 508 algorithm="best", --> 509 memory=Memory(cachedir=None, verbose=0), 510 approx_min_span_tree=True, 511 gen_min_span_tree=False,

TypeError: init() got an unexpected keyword argument 'cachedir'

research2023 avatar Sep 18 '22 21:09 research2023

This is the error I get when trying to run the google colab version of the your package. I am trying to run the GPU enhanced version to increase my ability of running large datasets.

research2023 avatar Sep 18 '22 21:09 research2023

!nvidia-smi

research2023 avatar Sep 18 '22 21:09 research2023

This get the RAPIDS-Colab install files and test check your GPU. Run this and the next cell only.

Please read the output of this cell. If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.

!pip install pynvml !pip install bertopic !git clone https://github.com/rapidsai/rapidsai-csp-utils.git !python rapidsai-csp-utils/colab/env-check.py

research2023 avatar Sep 18 '22 21:09 research2023

This will update the Colab environment and restart the kernel. Don't run the next cell until you see the session crash.

!bash rapidsai-csp-utils/colab/update_gcc.sh import os os._exit(00)

research2023 avatar Sep 18 '22 21:09 research2023

This will install CondaColab. This will restart your kernel one last time. Run this cell by itself and only run the next cell once you see the session crash.

import condacolab condacolab.install()

research2023 avatar Sep 18 '22 21:09 research2023

you can now run the rest of the cells as normal

import condacolab condacolab.check()

research2023 avatar Sep 18 '22 21:09 research2023

Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py '

The options are 'stable' and 'nightly'. Leaving it blank or adding any other words will default to stable.

!python rapidsai-csp-utils/colab/install_rapids.py stable import os os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so' os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/' os.environ['CONDA_PREFIX'] = '/usr/local'

research2023 avatar Sep 18 '22 21:09 research2023

from bertopic import BERTopic from cuml.cluster import HDBSCAN from cuml.manifold import UMAP

Create instances of GPU-accelerated UMAP and HDBSCAN

umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0) hdbscan_model = HDBSCAN(min_samples=10, gen_min_span_tree=True)

Pass the above models to be used in BERTopic

#topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model) #topics, probs = topic_model.fit_transform(docs)

research2023 avatar Sep 18 '22 21:09 research2023

I have pasted the code above. This is based on your recommendation on your FAQ page on how to use GPU to speed up the model.

research2023 avatar Sep 18 '22 21:09 research2023

https://medium.com/rapids-ai/accelerating-topic-modeling-with-rapids-and-bert-models-be9909eeed2 Even the authors of this post recommend your methods instead of theirs.

research2023 avatar Sep 18 '22 21:09 research2023

Please let me know if I am doing something wrong or how to proceed.

research2023 avatar Sep 18 '22 21:09 research2023

Based on your error message, the recent commit in the HDBSCAN repo, and that joblib was recently updated, it seems that the problem should be resolved by either installing HDBSCAN from its main branch or by installing a previous version of joblib after installing BERTopic:

pip install --upgrade joblib==1.1.0

Hopefully, this should fix your issue!

Also, a small tip, whenever you post code in an issue on GitHub, it helps to put it into a code block for easier readability.

MaartenGr avatar Sep 18 '22 22:09 MaartenGr

Exception Traceback (most recent call last) in ----> 1 from bertopic import BERTopic 2 from cuml.cluster import HDBSCAN 3 from cuml.manifold import UMAP 4 # Create instances of GPU-accelerated UMAP and HDBSCAN 5 umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)

16 frames /usr/local/lib/python3.7/site-packages/bertopic/init.py in ----> 1 from bertopic._bertopic import BERTopic 2 3 version = "0.12.0" 4 5 all = [

/usr/local/lib/python3.7/site-packages/bertopic/_bertopic.py in 22 # Models 23 import hdbscan ---> 24 from umap import UMAP 25 from sklearn.preprocessing import normalize 26 from sklearn import version as sklearn_version

/usr/local/lib/python3.7/site-packages/umap/init.py in 1 from warnings import warn, catch_warnings, simplefilter ----> 2 from .umap_ import UMAP 3 4 try: 5 with catch_warnings():

/usr/local/lib/python3.7/site-packages/umap/umap_.py in 30 import umap.distances as dist 31 ---> 32 import umap.sparse as sparse 33 34 from umap.utils import (

/usr/local/lib/python3.7/site-packages/umap/sparse.py in 10 import numpy as np 11 ---> 12 from umap.utils import norm 13 14 locale.setlocale(locale.LC_NUMERIC, "C")

/usr/local/lib/python3.7/site-packages/umap/utils.py in 38 39 ---> 40 @numba.njit("i4(i8[:])") 41 def tau_rand_int(state): 42 """A fast (pseudo)-random number generator.

/usr/local/lib/python3.7/site-packages/numba/core/decorators.py in wrapper(func) 217 with typeinfer.register_dispatcher(disp): 218 for sig in sigs: --> 219 disp.compile(sig) 220 disp.disable_compile() 221 return disp

/usr/local/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, sig) 963 with ev.trigger_event("numba:compile", data=ev_details): 964 try: --> 965 cres = self._compiler.compile(args, return_type) 966 except errors.ForceLiteralArg as e: 967 def folded(args, kws):

/usr/local/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, args, return_type) 123 124 def compile(self, args, return_type): --> 125 status, retval = self._compile_cached(args, return_type) 126 if status: 127 return retval

/usr/local/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_cached(self, args, return_type) 137 138 try: --> 139 retval = self._compile_core(args, return_type) 140 except errors.TypingError as e: 141 self._failed_cache[key] = e

/usr/local/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_core(self, args, return_type) 155 args=args, return_type=return_type, 156 flags=flags, locals=self.locals, --> 157 pipeline_class=self.pipeline_class) 158 # Check typing error if object mode is used 159 if cres.typing_error is not None and not flags.enable_pyobject:

/usr/local/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class) 690 """ 691 pipeline = pipeline_class(typingctx, targetctx, library, --> 692 args, return_type, flags, locals) 693 return pipeline.compile_extra(func) 694

/usr/local/lib/python3.7/site-packages/numba/core/compiler.py in init(self, typingctx, targetctx, library, args, return_type, flags, locals) 383 # Make sure the environment is reloaded 384 config.reload_config() --> 385 typingctx.refresh() 386 targetctx.refresh() 387

/usr/local/lib/python3.7/site-packages/numba/core/typing/context.py in refresh(self) 156 Useful for third-party extensions. 157 """ --> 158 self.load_additional_registries() 159 # Some extensions may have augmented the builtin registry 160 self._load_builtins()

/usr/local/lib/python3.7/site-packages/numba/core/typing/context.py in load_additional_registries(self) 699 700 def load_additional_registries(self): --> 701 from . import ( 702 cffi_utils, 703 cmathdecl,

/usr/local/lib/python3.7/site-packages/numba/core/typing/cffi_utils.py in 17 try: 18 import cffi ---> 19 ffi = cffi.FFI() 20 except ImportError: 21 ffi = None

/usr/local/lib/python3.7/dist-packages/cffi/api.py in init(self, backend) 54 raise Exception("Version mismatch: this is the 'cffi' package version %s, located in %r. When we import the top-level '_cffi_backend' extension module, we get version %s, located in %r. The two versions should be equal; check your installation." % ( 55 version, file, ---> 56 backend.version, backend.file)) 57 else: 58 # PyPy

Exception: Version mismatch: this is the 'cffi' package version 1.15.1, located in '/usr/local/lib/python3.7/dist-packages/cffi/api.py'. When we import the top-level '_cffi_backend' extension module, we get version 1.15.0, located in '/usr/local/lib/python3.7/site-packages/_cffi_backend.cpython-37m-x86_64-linux-gnu.so'. The two versions should be equal; check your installation.``

research2023 avatar Sep 19 '22 01:09 research2023

!pip install bertopic !pip install --upgrade joblib==1.1.0``

research2023 avatar Sep 19 '22 01:09 research2023

I did as you recommended and I obtained a version mismatch.

research2023 avatar Sep 19 '22 01:09 research2023

Rather than use the RAPIDS Colab script, I recommend combining the suggestion above with this comment about SageMaker Studio Lab (perhaps using new versions as needed/useful)

beckernick avatar Sep 19 '22 16:09 beckernick

I used what beckernick suggested in combination with some of the recommendations MaartenGr recommended and it worked. I gained a considerable improvement in the processing power. I appreciate both of your help in this. I took a couple of tries but the code ended up working.

research2023 avatar Sep 25 '22 17:09 research2023