ETM icon indicating copy to clipboard operation
ETM copied to clipboard

Any suggestion on multiple topics for one document?

Open WalterKung opened this issue 6 years ago • 4 comments

Thank you for your work on ETM model. I applied my documents using ETM. ETM gave clearer cut topics than LDA did.

The original LDA could have multiple topics assign to a single document. In the paper, you are using softmax for theta - topic embedding. The softmax tend to assign one topic for one document. I am wondering if you can give me some suggestion on how I can use ETM to get multiple topics from a single document. I am using get_theta(normalized_data_batch) to get the topic distribution.

https://github.com/WalterKung/DataConference2020/blob/master/P2_TOPIC_MODEL/SS_TOPIC_MODEL_Stock_by_news.ipynb

WalterKung avatar Oct 17 '19 00:10 WalterKung

Is there any chance that you could explain the parameters? I'm having a bit of trouble using them properly. An example would be really helpful.

dubbsbrandon avatar Oct 25 '19 17:10 dubbsbrandon

Could this be related to isotropic Gaussian prior over theta logits (as in typical VAEs)?

gokceneraslan avatar Mar 01 '20 13:03 gokceneraslan

guess as the log-normal's natural, smaller sigma on the Gaussian prior would give your smoother topic proportions. However this implement seems do not allow for a configurable sigma——it is hard-coded the Gaussian prior to (mu=0,sigma=1) in the encoder. Could you change that and report here later?

qixiang109 avatar Apr 22 '20 09:04 qixiang109

@WalterKung you metioned you are using get_theta(normalized_data_batch) as the way to get the topic dist - is this the correct way?

There are quite a few questions in this repo on how to predict on new data: https://github.com/adjidieng/ETM/issues/4

Thanks in advance!

ydennisy avatar Oct 09 '20 17:10 ydennisy