Top2Vec
Top2Vec copied to clipboard
Exception throw on small corpus
model = Top2Vec(documents=newsgroups.data[1:100], speed="learn", min_count=1)
fails with
022-05-11 19:03:47,753 - top2vec - INFO - Finding topics
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/carsten/.conda/envs/top2vec-2/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 684, in __init__
self._create_topic_vectors(cluster.labels_)
File "/home/carsten/.conda/envs/top2vec-2/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 859, in _create_topic_vectors
np.vstack([self.document_vectors[np.where(cluster_labels == label)[0]]
File "<__array_function__ internals>", line 5, in vstack
File "/home/carsten/.conda/envs/top2vec-2/lib/python3.9/site-packages/numpy/core/shape_base.py", line 283, in vstack
return _nx.concatenate(arrs, 0)
File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate
I suppose, it's due to only finding a single topic or similar.
Yes, you will need a larger dataset.
Ok, but it would be nice not to crash. A "single topic" result is a valid result as well, in my view.
Is there any documentation on why a minimum corpus size is needed or the quality of results as the corpus grows?