Top2Vec
Top2Vec copied to clipboard
Using a pre-trained doc2vec model
Hello!
I have a text mining use-case with one overarching document set, consisting of many smaller sub-sets of documents. i want to train a topic model for each smaller sub-set of documents, but these sometimes don't contain enough documents on their own. Besides, I would rather use the knowledge of the entire document set to then build a topic model for each sub set.
Training SBERT encoder on this dataset works, but does not provide a tangible improvement over the standard doc2vec option, and takes very long.
So i was wondering: Is there a way to train a doc2vec model on the entire document set, then use that in top2vec to build a topic model for each subset of documents, instead of building a doc2vec model from scratch each time? And maybe there are other options i am not aware of?
You can train a doc2vec model on your whole dataset then pass it as a callable to embedding_model
.