Top2Vec icon indicating copy to clipboard operation
Top2Vec copied to clipboard

Exception throw on small corpus

Open behrica opened this issue 2 years ago • 3 comments

 model = Top2Vec(documents=newsgroups.data[1:100], speed="learn", min_count=1)

fails with

022-05-11 19:03:47,753 - top2vec - INFO - Finding topics
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/carsten/.conda/envs/top2vec-2/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 684, in __init__
    self._create_topic_vectors(cluster.labels_)
  File "/home/carsten/.conda/envs/top2vec-2/lib/python3.9/site-packages/top2vec/Top2Vec.py", line 859, in _create_topic_vectors
    np.vstack([self.document_vectors[np.where(cluster_labels == label)[0]]
  File "<__array_function__ internals>", line 5, in vstack
  File "/home/carsten/.conda/envs/top2vec-2/lib/python3.9/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

I suppose, it's due to only finding a single topic or similar.

behrica avatar May 11 '22 17:05 behrica

Yes, you will need a larger dataset.

ddangelov avatar Nov 13 '22 21:11 ddangelov

Ok, but it would be nice not to crash. A "single topic" result is a valid result as well, in my view.

behrica avatar Nov 14 '22 18:11 behrica

Is there any documentation on why a minimum corpus size is needed or the quality of results as the corpus grows?

ClaytonSmith avatar Dec 04 '22 18:12 ClaytonSmith