tomotopy
tomotopy copied to clipboard
HDP Model document-topic distribution and topic-word distribution does not sum to 1
Hello, I have encountered an issue where the sum of the topic-word distribution also does not sum to 1. I am running version 0.12.1, with hyperparameters tw=TermWeight.PMI,
gamma=1,
alpha=0.1,
eta=0.001,
initial_k=20,
seed=1.
I have run the HDP model previously on a different, larger dataset, and did not encounter this issue.
Thanks for any help here and apologies if this is a misunderstanding on my part.
Hi @alexs131 Thank you for reporting the bug. It seems to be a problem with floating point precision errors.
https://github.com/bab2min/tomotopy/blob/926f6ff34599a19d20b322f8b1a13fe66e8c5986/src/TopicModel/HDPModel.hpp#L493-L506
Currently, the numerator(doc.numByTopic
) and denominator(doc.getSumWordWeight()
) of topic distribution are stored separately, and it seems that errors in these values accumulate during the training process, especially on smaller dataset.
I'll investigate this problem more.