gensim icon indicating copy to clipboard operation
gensim copied to clipboard

Wrong power base in LDA Model log_perplexity documentation

Open mf908 opened this issue 5 years ago • 6 comments

Problem description

Gensim LDAModel documentation incorrect

Steps/code/corpus to reproduce

Based on the code in log_perplexity, it looks like it should be e^(-bound) since all of the functions used in computing it seem to be using the natural logarithm/e

mf908 avatar Oct 07 '19 14:10 mf908

Thank you for pointing this out.

Could you please be more specific? What documentation, what file, what part of that file in particular?

mpenkov avatar Oct 12 '19 06:10 mpenkov

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/ldamodel.py

The log_perplexity function where it says:

Calculate and return per-word likelihood bound, using a chunk of documents as evaluation corpus. Also output the calculated statistics, including the perplexity=2^(-bound), to log at INFO level.

mf908 avatar Oct 14 '19 15:10 mf908

If you look at the source code, you'l see that the base is 2:

https://github.com/RaRe-Technologies/gensim/blob/e1025743dd022ff87b55dde8ef2c85167d2e469d/gensim/models/ldamodel.py#L824

This appears to be correct (matches the docstring). Where are you seeing e as a base?

mpenkov avatar Oct 21 '19 17:10 mpenkov

If you look at the bound function in ldamodel.py all of the functions there utilize natural log as opposed to base 2.

mf908 avatar Nov 07 '19 17:11 mf908

I was facing the same issue. At a log level, it prints exponential value with base 2 but the function returns a value of base e. This should be pointed out while this is a bit confusing.

Xilorole avatar Feb 20 '21 15:02 Xilorole

so how can i calculate the correct perplexity with gensim?

Lehas-sudo avatar Jun 23 '23 15:06 Lehas-sudo