BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

Quiet .merge_topics()

Open eschaffn opened this issue 10 months ago • 9 comments

When running the topic_model.merge_topics() function it prints a TQDM progress bar during the merge. I instantiated BERTopic() with verbose=False.

Is there a way to quiet the merge_topics() function?

eschaffn avatar Apr 03 '24 15:04 eschaffn

Could you show which TQDM bar is being shown? Looking at the code of .merge_topics I do not see an additional TQDM bar aside from what might be shown in related functions. Also, which version of BERTopic are you using?

MaartenGr avatar Apr 05 '24 07:04 MaartenGr

image

eschaffn avatar Apr 08 '24 17:04 eschaffn

Maybe this comes from re-running the representation model after it merges?

eschaffn avatar Apr 08 '24 17:04 eschaffn

Hmm, not too sure what is happening here. Could you provide the full logging (including the tqdm bar), along with the full code and the BERTopic version?

MaartenGr avatar Apr 08 '24 17:04 MaartenGr

Function used for merging:

def merge_topics(distance_threshold, hierarchy_var, topic_model, data):
        topics_to_merge = []

        for merge_candidate in hierarchy_var.iterrows():
            distance = merge_candidate[1][-1]
            if distance <= distance_threshold:
                topics_to_merge.append(merge_candidate[1][2])

        topic_model.merge_topics(data, topics_to_merge)
        hierarchy_tree = topic_model.hierarchical_topics(data)

        topic_df = pd.DataFrame(
            {"Document": data, "Topic": topic_model.topics_})

        return [hierarchy_tree,
                topic_model,
                topic_df,
                topic_model.visualize_hierarchy()]

Initializing BERTopic like this:


chain = load_qa_chain(Ollama(model="zephyr"), chain_type="stuff")

representation_model = {
    "LLM Summary": LangChain(chain=chain, diversity=0.7, nr_docs=10)
    }

topic_model = BERTopic(representation_model=representation_model,
                       verbose=False,
                       language="multilingual",
                       nr_topics="auto")

Version: bertopic 0.16.0 pypi_0 pypi

eschaffn avatar Apr 08 '24 17:04 eschaffn

Hmmm, that might be the LangChain backend that you are using but I'm not sure. Do you also get this progress bar when you run .fit? Also, I'm not seeing you actually fitting the model, is that correct?

MaartenGr avatar Apr 10 '24 09:04 MaartenGr

Hmmm, that might be the LangChain backend that you are using but I'm not sure. Do you also get this progress bar when you run .fit? Also, I'm not seeing you actually fitting the model, is that correct?

I fit the model using .fit_transform(). I just didn't include that in the code. The progress bar seems to be related to the .merge_topics(). I don't notice the same progress bar during .fit() or .fit_transform().

eschaffn avatar Apr 10 '24 18:04 eschaffn

I found it. It's .hierarchical_topics(). Line 1003 in _bertopic.py.

eschaffn avatar Apr 10 '24 18:04 eschaffn

Ah, I was looking at .merge_topics since you mentioned it gave a TQDM bar there. No wonder I couldn't find it.

Yes, that should be a rather straightforward change to include that one in the verbose functionality. The only that needs to be added is , disable=not self.verbose and it should work. If you want, a PR would be appreciated!

MaartenGr avatar Apr 14 '24 07:04 MaartenGr