online-hdp icon indicating copy to clipboard operation
online-hdp copied to clipboard

Topics in onlinehdp

Open Abigale001 opened this issue 6 years ago • 3 comments

In the paper, the author said, the number of topics can be determined with cross validation or held-out likelihood. But I run the code, and just set default T and K. The number of topics is always equal to T. Anyone knows why?

Abigale001 avatar Aug 07 '18 08:08 Abigale001

before that, authors say "In a traditional setting, where fitting multiple models might be viable" and thereafter that, they say: "However, these techniques become impractical when the data set size is large, and they become impossible when the data are streaming. Online HDP provides the speed of online variational Bayes with the modeling flexibility of the HDP."

T=150 and default show is 20 topics, and they are ordered by relevance

jorgecastillo2 avatar Oct 03 '18 03:10 jorgecastillo2

@jorgecastillo2 this does not correspond to the motivation "Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions." I find that many discussions are open about this point and no concrete answer is given yet.

zeyd31 avatar Feb 11 '19 14:02 zeyd31

The gensim implementation based in it get the same error, the number of topics inferred is always equal to T, i will have to use the C++ implementation in place.

dgarridoa avatar Mar 01 '20 20:03 dgarridoa