warplda performance on nytimes dataset

performance on nytimes dataset

Open kyhhdm opened this issue 7 years ago • 3 comments

hi, Warplda is very cool~, I have some trouble when evaluating the performance and need your help.

I got log_lielihood(per token) -10.426584, for nytimes dataset after 100 iterations with mh=2. As the dataset has 99542125 tokens in total, which is about 10M( the paper has a typo error there as 100M), the log_likelihood should be -1.03e+9. But this result is inconsistent with the Fig5, row1.1, where the log_likelihood at 100 iteration is larger than -1e+9.

btw: can the code run in distributed mode?

Mar 27 '17 14:03 kyhhdm

Hi kyhhdm,

The log likelihood is reported per token, i.e., it is divided by the number of tokens.

This open-sourced version cannot be run in distributed mode. We do have a (premature) distributed LDA code, please check this https://github.com/thu-ml/BigTopicModel

Apr 06 '17 04:04 cjf00000

Hello, I am trying to run some comparisons regarding running times and log-likelihood etc. vs Petuum(PMLS) for example.

I have noticed WarpLDA can output both log-likelihood (per iterations) and perplexity(with the -perplexity flag). (And there is no direct way to get word_loglikelihood, doc_loglikelihood, total_loglikelihood without adding code) Quickly looking though the code I found that the definition of perplexity used is: perplexity = e^(-L/NE) where L is the total loglikelihood? and NE is the total number of tokens.

Could you help me clarify these values? should loglikelihood (per token) * numberoftokens = -ln(perplexity)*numberoftokens If not .. is the loglikelihood (per token) in each iteration just an estimation? Or is it the total combined loglikelihood? Is the log_likelihood used in perplexity only the model loglikelihood, or is it the real total loglikelihood?

Many thanks!

Apr 13 '17 15:04 ghost

Any updates on @mromaios doubt? Could you please help clarify the value and relations ? loglikelihood (per token) , perplexity , how can we get the respective log likelihood per docs in both train and test cases ?

Thanks in advance!

May 30 '18 09:05 swapnil2211

warplda warplda copied to clipboard

performance on nytimes dataset

warplda
warplda copied to clipboard