tan-clustering icon indicating copy to clipboard operation
tan-clustering copied to clipboard

0 weight for unassociated words

Open mheilman opened this issue 12 years ago • 1 comments

In class_lm_cluster.compute_weight(), if two words don't occur by each other (i.e., paircount == 0), then the function returns 0.0 for the weight. Is this the appropriate behavior, given that it would otherwise (i.e., if there weren't a check on paircount) return 0.0 * log(0.0), which is nan?

mheilman avatar Sep 11 '13 22:09 mheilman

I also got

<ipython-input-7-1e2bd0465582> in make_pair_scores(self, pair_iter)
    211             # note that these counts are ints!
    212             # (but the log function returns floats)
--> 213             score = log(paircount)                     - log(self.word_counts[c1])                     - log(self.word_counts[c2])
    214 
    215             self.current_batch_scores[c1][c2] = score

ValueError: math domain error

which I guess it's because of log(0.0)

luthfianto avatar Jun 18 '17 15:06 luthfianto