BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

cannot import BERTopic --> issue links to huggingface huggingface_hub repo

Open SDAravind opened this issue 2 years ago • 1 comments

Firstly, thanks for making this library and opensource. It's really awesome.

I had used BERTopic before but today I couldn't run it after installing. It throwed below error. I tried to resolve but couldn't get beyond this. Hugging Face has made some changes to huggingface_hub (link). Any direction to resolve this?

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
~/GitHub/learning/l_ml/l_bertopic.ipynb Cell 1' in <cell line: 1>()
----> [1](vscode-notebook-cell:~/GitHub/test_bertopic/clustering/l_bertopic.ipynb#ch0000003?line=0) from bertopic import BERTopic
      [2](vscode-notebook-cell:~/GitHub/test_bertopic/clustering/l_bertopic.ipynb#ch0000003?line=1) from sklearn.datasets import fetch_20newsgroups
      [4](vscode-notebook-cell:~/GitHub/test_bertopic/clustering/l_bertopic.ipynb#ch0000003?line=3) docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

File ~/.cache/pypoetry/virtualenvs/clustering-ho3GNO8w-py3.8/lib/python3.8/site-packages/bertopic/__init__.py:1, in <module>
----> 1 from bertopic._bertopic import BERTopic
      3 __version__ = "0.10.0"
      5 __all__ = [
      6     "BERTopic",
      7 ]

File ~/.cache/pypoetry/virtualenvs/clustering-ho3GNO8w-py3.8/lib/python3.8/site-packages/bertopic/_bertopic.py:31, in <module>
     29 from bertopic._utils import MyLogger, check_documents_type, check_embeddings_shape, check_is_fitted
     30 from bertopic._mmr import mmr
---> 31 from bertopic.backend._utils import select_backend
     32 from bertopic import plotting
     34 # Visualization

File ~/.cache/pypoetry/virtualenvs/clustering-ho3GNO8w-py3.8/lib/python3.8/site-packages/bertopic/backend/__init__.py:2, in <module>
      1 from ._base import BaseEmbedder
----> 2 from ._word_doc import WordDocEmbedder
      3 from ._utils import languages
...
   (...)
    418     use_auth_token: Union[bool, str, None] = None
    419 ) -> str:

ImportError: cannot import name 'REPO_ID_SEPARATOR' from 'huggingface_hub.snapshot_download' (~/.cache/pypoetry/virtualenvs/clustering-ho3GNO8w-py3.8/lib/python3.8/site-packages/huggingface_hub/snapshot_download.py)

When looked into this I see huggingface has made snapshot_download.py private as shown below.

# TODO: remove in 0.11

import warnings


warnings.warn(
    "snapshot_download.py has been made private and will no longer be available from"
    " version 0.11. Please use `from huggingface_hub import snapshot_download` to"
    " import the only public function in this module. Other members of the file may be"
    " changed without a deprecation notice.",
    FutureWarning,
)

from ._snapshot_download import *  # noqa
from .constants import REPO_ID_SEPARATOR  # noqa

SDAravind avatar Jun 18 '22 14:06 SDAravind

Thanks for sharing this! It is a known issue within the sentence-transformers framework which will soon be fixed and released. You can find a bit more about that here. You can also install sentence-transformers from the master branch where it is already fixed.

MaartenGr avatar Jun 19 '22 06:06 MaartenGr