ETM icon indicating copy to clipboard operation
ETM copied to clipboard

Negative coherence on short texts

Open elbadma opened this issue 3 years ago • 1 comments

Hi, I saw that one can use DETM on short texts. I tried ETM on short texts (each text contains only one sentence) and it seemed to work. However, the coherence score became negative. How should I interpret it? Does lower coherence always mean worse? Or do scores closer to 0 mean worse? Whenever I try ETM on normal-length texts (consisting of more than one sentence), the coherence is always positive, so I assume that negative coherence is caused by short length

elbadma avatar Mar 17 '21 08:03 elbadma

Hi! Coherence is computed as the normalized pointwise mutual information, which ranges between -1 and 1. That means scores lower than 0 are fine. It usually happens in the case of short-text documents because documents are much sparser and words co-occur less frequently. Just make sure not to compare coherences computed on two different datasets.

silviatti avatar Mar 23 '21 08:03 silviatti