zilch42

Results 22 comments of zilch42

I thought about that, but your KeyBERT example doesn't use `min_df` at all. Every topic should have words in that example. I wonder if it is that there are docs...

Ok, I've figured it out. The docs inside BERTopic get cleaned internally by `_preprocess_text()` before being tokenized, so by creating a vocabulary outside of BERTopic, even if it is created...

Sure, most docs in newsgroups have at least one example but try doc[8]. It has a few ```python from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import CountVectorizer import numpy as np...

I didn't necessarily see it as I bug for my use case. I like the preprocessing, and I wouldn't necessarily want things like "couldn't" being transformed to ["couldn", "t"] which...

Thanks Maarten, I'm about to finish up for the year but if this is still open min January I'll submit one then

Thanks Maarten, That's more or less what I'm doing at the moment, except that zeroshot doesn't actually assign the probabilities so `topic_model.probabilities_` is `nan` so I'm recalculating the zeroshot topic...

Thanks Maarten. I look forward to those developments. I initially had a look at what's going on in `visualize_hierarchical_documents` and couldn't make much sense of it but if I do...

Thanks @trangdata , glad to know there is a method for getting at the data. It would be intuitive in my mind to flatten to the lowest possible level and...

Thanks @trangdata. The updated documentation is definitely clearer