BERTopic
BERTopic copied to clipboard
logger.warning() formatting issue in topic_model.save()
Hi Maarten,
I'm getting the following error trying to save a topic model without an embedding model pointer.
from sklearn.datasets import fetch_20newsgroups
from bertopic import BERTopic
# Documents to train on
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data'][0:500]
topic_model = BERTopic().fit(docs)
topic_model.save("model_dir", serialization="safetensors", save_ctfidf=True, save_embedding_model=False)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 8
5 docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data'][0:500]
6 topic_model = BERTopic().fit(docs)
----> 8 topic_model.save("model_dir", serialization="safetensors", save_ctfidf=True, save_embedding_model=False)
File c:\path\lib\site-packages\bertopic\_bertopic.py:2998, in BERTopic.save(self, path, serialization, save_embedding_model, save_ctfidf)
2996 save_embedding_model = self.embedding_model._hf_model
2997 elif not save_embedding_model:
-> 2998 logger.warning("You are saving a BERTopic model without explicitly defining an embedding model."
2999 "If you are using a sentence-transformers model or a HuggingFace model supported"
3000 "by sentence-transformers, please save the model by using a pointer towards that model."
3001 "For example, `save_embedding_model=sentence-transformers/all-mpnet-base-v2`", RuntimeWarning)
3003 # Minimal
3004 save_utils.save_hf(model=self, save_directory=save_directory, serialization=serialization)
TypeError: warning() takes 2 positional arguments but 3 were given
Thanks for sharing! This seems like it should be relatively easy to fix by making sure that the strings passed to logger.warning
is a single string and not a couple of them. If you want, a PR would be appreciated.
For now, just setting save_embedding_model
to True would prevent this from happening and does not seem to have any downsides.
Thanks Maarten, I'm about to finish up for the year but if this is still open min January I'll submit one then