need some ideas
I am using the BERTopic framework for my tasks, but my data is increasing daily, and I want to perform analysis periodically. Do you have any suggestions? Thank you.
Same issue here, looking for best practices on a similar use case, topic modeling on a dataset that increase over time and wanna avoid re-analyze hyperparameters every months
I would generally advise using the merge_models functionality for this as it allows for training new models and iteratively merging them. This would also make it a bit more flexible for different types of models (parameter-wise) to be merged.
Thank you Maarten, do you advise any specific hyperparameters for that kind of use case ? I mean to avoid having to rework it frequently and let it live. It can be tricky due to high volatility of umap
In my experience, I seldom have to change the parameters of UMAP to get the kind of dimensionality reduction that I need. The only reason to do so if the datasize would change drastically (from millions to hundreds) but in those cases HDBSCAN is more finicky to control than UMAP. With HDBSCAN it is often about tuning min_cluster_size.