GuidedLDA icon indicating copy to clipboard operation
GuidedLDA copied to clipboard

Incremental learning

Open cassiehkx opened this issue 6 years ago • 5 comments

It worked very well on small dataset. Can it be improved to enable incremental learning in case of a huge dataset?

cassiehkx avatar Jun 07 '18 11:06 cassiehkx

@cassiehkx can you share link to some other form of LDA which has incremental learning or any other library?

Basically I want to understand how that works so I can think of an approach to do it.

Thanks :)

vi3k6i5 avatar Jun 07 '18 11:06 vi3k6i5

I haven't got useful suggestions in this case to enable incremental learning. But the problem lies in your input as an nparray, which makes the program unable to expand to a large amount of data. Incremental learning is just one solution I could think of at the moment. Maybe we could change the input data into a sparse matrix? but in that case the matrix multiplication in the loglikelihood would be a problem. what would you recommend?

cassiehkx avatar Jun 12 '18 07:06 cassiehkx

Fair point, switching to sparse matrix should be easier in comparison to incremental learning.

vi3k6i5 avatar Jun 12 '18 12:06 vi3k6i5

Then another question is the difference between your code and the original scikit-learn LDA code where the eta parameter can control the initialization weight. The paper you were referring to described a more sophisticated method, while your code seems to only set a higher weight for the seed words in the beginning and does not do much during the loglikelihood calculation part. Then what would be the difference comparing to just setting an initialized eta matrix with seed words?

cassiehkx avatar Jun 13 '18 02:06 cassiehkx

In the Gensim implementation of LDA, you can set chunk size to learn incrementally I think?

tmerrittsmith avatar Aug 14 '18 08:08 tmerrittsmith