DETM
DETM copied to clipboard
can't reproduce the preprocessed data
Hi there, I ran https://github.com/adjidieng/DETM/blob/master/scripts/data_undebates.py on the kaggle data for un debates (as link in your paper: https://www.kaggle.com/unitednations/un-general-debates) but I am unable to reproduce the preprocessed data you linked here https://bitbucket.org/franrruiz/data_undebates_largev/src/master/ (variables in .mat files are different from yours) . Any idea? There is not much setting beside min_df and max_df. I used the default, perhaps you used something else?
Might be too obvious, but could it just be because of the random permutation with no seed? Apart from that, I've observed a lot of things I had to change in the code to get it to run and to implement the model as described in the paper. I was never able to reproduce the results using the original code.
hm...possibly. Same here on having to change a lot. Perhaps we should submit some PRs.
Let's work on converting it to a python library @quynhneo @mona-timmermann
What do you think?
Although I notice a new error that occurs on a large dataset
Not a bad idea ... Ideally we have @adjidieng supports the idea .
I can talk to @adjidieng tomorrow and i will keep you in touch with her response
wyt? @mona-timmermann
Adji said we can proceed but we will upload the package as a branch on this repo. @quynhneo @mona-timmermann lets get this done
@Emekaborisama Hi any updates on the python script to reproduce this study? thank you very much.
that's cool. thx.
On Wed, Feb 3, 2021 at 4:47 PM Quynh M. Nguyen [email protected] wrote:
I have made it to work, see my fork https://github.com/quynhneo/DETM
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adjidieng/DETM/issues/10#issuecomment-772846227, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAUW4ZROIQ2K2VNOQ5ONMDS5G76XANCNFSM4T2WUOAA .
Hi Mr Nguyen,
I have a follow-up question regarding the script running DETM after you preprocessing all your data. I checked your script and you split the data into training vs testing set.
Why did you do that? I thought it is supposed to be unsupervised learning? Thank you very much.
On Wed, Feb 3, 2021 at 8:58 PM It’s Jenny’s Wonderland [email protected] wrote:
that's cool. thx.
On Wed, Feb 3, 2021 at 4:47 PM Quynh M. Nguyen [email protected] wrote:
I have made it to work, see my fork https://github.com/quynhneo/DETM
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adjidieng/DETM/issues/10#issuecomment-772846227, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAUW4ZROIQ2K2VNOQ5ONMDS5G76XANCNFSM4T2WUOAA .
according to the paper, they calculate perplexity using test documents.