tomotopy icon indicating copy to clipboard operation
tomotopy copied to clipboard

HDP Model document-topic distribution and topic-word distribution does not sum to 1

Open alexs131 opened this issue 3 years ago • 1 comments

Hello, I have encountered an issue where the sum of the topic-word distribution also does not sum to 1. I am running version 0.12.1, with hyperparameters tw=TermWeight.PMI, gamma=1, alpha=0.1, eta=0.001, initial_k=20, seed=1. I have run the HDP model previously on a different, larger dataset, and did not encounter this issue.

Thanks for any help here and apologies if this is a misunderstanding on my part.

alexs131 avatar Aug 19 '21 14:08 alexs131

Hi @alexs131 Thank you for reporting the bug. It seems to be a problem with floating point precision errors.

https://github.com/bab2min/tomotopy/blob/926f6ff34599a19d20b322f8b1a13fe66e8c5986/src/TopicModel/HDPModel.hpp#L493-L506

Currently, the numerator(doc.numByTopic) and denominator(doc.getSumWordWeight()) of topic distribution are stored separately, and it seems that errors in these values accumulate during the training process, especially on smaller dataset.

I'll investigate this problem more.

bab2min avatar Aug 24 '21 12:08 bab2min