gensim
gensim copied to clipboard
Callbacks for LDAMultiCore
This PR upgrades the multi-core implementation of LDA
to use callbacks 💪.
Callbacks are critical for model evaluation in general, and have been requested in past for Gensim's model in particular 🙏.
A usage example on News20 dataset:
from gensim.models import LdaMulticore
from gensim.models.callbacks import CoherenceMetric, PerplexityMetric
from gensim.models import LdaMulticore, LdaModel
callback1 = CoherenceMetric(corpus=mm_corpus, dictionary=dictionary, coherence='u_mass', title='u_mass')
callback2 = CoherenceMetric(corpus=mm_corpus, texts=docs_tokenized, dictionary=dictionary, coherence='c_v', title='c_v',)
lda = LdaMulticore(mm_corpus, id2word=dictionary, num_topics=20, passes=20, batch=False, callbacks=[callback1,callback2])
# evaluation
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
metrics = pd.DataFrame(lda.metrics)
metrics.reset_index(names=['epoch'], inplace=True)
metrics['epoch'] = metrics['epoch']+1
fig,ax1 = plt.subplots()
ln1=ax1.plot(metrics['epoch'],metrics['u_mass'],label='$U_{mass}$',color='tab:red')
ax1.set_xlabel('epoch')
ax1.set_ylabel('$U_{mass}$')
ax2 = ax1.twinx()
ln2 = ax2.plot(metrics['epoch'],metrics['c_v'],label='$C_v$',color='tab:blue')
ax2.set_ylabel('$C_v$')
lines = ln1+ln2
labels = [l.get_label() for l in lines]
ax2.legend(lines, labels, loc=0)
plt.show()
This illustrates the point of using callbacks: we know how many epochs are sufficient to converge 🆒
Also, the doc string has been made more accurate:
callbacks : list of :class:`~gensim.models.callbacks.Callback`
Metric callbacks to log evaluation metrics of the model at every training epoch.
For a full example see this Kaggle notebook.
DISCLAIMER: this is a byproduct of the implementation for the purpose of a research paper.
@maciejskorski Looks like some tests in your PR are failing. Are you able to fix them?
Closing as stale. Are you still using LDA in 2024? What is your use-case here? Thanks.