BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

logger.warning() formatting issue in topic_model.save()

Open zilch42 opened this issue 1 year ago • 2 comments

Hi Maarten,

I'm getting the following error trying to save a topic model without an embedding model pointer.

from sklearn.datasets import fetch_20newsgroups
from bertopic import BERTopic

# Documents to train on
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data'][0:500]
topic_model = BERTopic().fit(docs)

topic_model.save("model_dir", serialization="safetensors", save_ctfidf=True, save_embedding_model=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 8
      5 docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data'][0:500]
      6 topic_model = BERTopic().fit(docs)
----> 8 topic_model.save("model_dir", serialization="safetensors", save_ctfidf=True, save_embedding_model=False)

File c:\path\lib\site-packages\bertopic\_bertopic.py:2998, in BERTopic.save(self, path, serialization, save_embedding_model, save_ctfidf)
   2996     save_embedding_model = self.embedding_model._hf_model
   2997 elif not save_embedding_model:
-> 2998     logger.warning("You are saving a BERTopic model without explicitly defining an embedding model."
   2999                    "If you are using a sentence-transformers model or a HuggingFace model supported"
   3000                    "by sentence-transformers, please save the model by using a pointer towards that model."
   3001                    "For example, `save_embedding_model=sentence-transformers/all-mpnet-base-v2`", RuntimeWarning)
   3003 # Minimal
   3004 save_utils.save_hf(model=self, save_directory=save_directory, serialization=serialization)

TypeError: warning() takes 2 positional arguments but 3 were given

zilch42 avatar Dec 14 '23 06:12 zilch42

Thanks for sharing! This seems like it should be relatively easy to fix by making sure that the strings passed to logger.warning is a single string and not a couple of them. If you want, a PR would be appreciated.

For now, just setting save_embedding_model to True would prevent this from happening and does not seem to have any downsides.

MaartenGr avatar Dec 15 '23 10:12 MaartenGr

Thanks Maarten, I'm about to finish up for the year but if this is still open min January I'll submit one then

zilch42 avatar Dec 17 '23 23:12 zilch42