gensim
gensim copied to clipboard
Wrong power base in LDA Model log_perplexity documentation
Problem description
Gensim LDAModel documentation incorrect
Steps/code/corpus to reproduce
Based on the code in log_perplexity, it looks like it should be e^(-bound) since all of the functions used in computing it seem to be using the natural logarithm/e
Thank you for pointing this out.
Could you please be more specific? What documentation, what file, what part of that file in particular?
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/ldamodel.py
The log_perplexity function where it says:
Calculate and return per-word likelihood bound, using a chunk of documents as evaluation corpus. Also output the calculated statistics, including the perplexity=2^(-bound), to log at INFO level.
If you look at the source code, you'l see that the base is 2:
https://github.com/RaRe-Technologies/gensim/blob/e1025743dd022ff87b55dde8ef2c85167d2e469d/gensim/models/ldamodel.py#L824
This appears to be correct (matches the docstring). Where are you seeing e
as a base?
If you look at the bound function in ldamodel.py all of the functions there utilize natural log as opposed to base 2.
I was facing the same issue. At a log level, it prints exponential value with base 2 but the function returns a value of base e. This should be pointed out while this is a bit confusing.
so how can i calculate the correct perplexity with gensim?