tan-clustering
tan-clustering copied to clipboard
0 weight for unassociated words
In class_lm_cluster.compute_weight(), if two words don't occur by each other (i.e., paircount == 0), then the function returns 0.0 for the weight. Is this the appropriate behavior, given that it would otherwise (i.e., if there weren't a check on paircount) return 0.0 * log(0.0), which is nan?
I also got
<ipython-input-7-1e2bd0465582> in make_pair_scores(self, pair_iter)
211 # note that these counts are ints!
212 # (but the log function returns floats)
--> 213 score = log(paircount) - log(self.word_counts[c1]) - log(self.word_counts[c2])
214
215 self.current_batch_scores[c1][c2] = score
ValueError: math domain error
which I guess it's because of log(0.0)