lda2vec-tf icon indicating copy to clipboard operation
lda2vec-tf copied to clipboard

get the embedding of a new document

Open ali3assi opened this issue 6 years ago • 18 comments

After we run the code and get the document embedding how can we use it to predit the embedding of a new unobserved document?

ali3assi avatar May 03 '18 20:05 ali3assi

I've been thinking on the same thing. Technically you cant (to my limited understanding). However, you could train another doc2vec and use something like gensim to infer the vector. Perhaps you could additionally make sure it is on the same scale as the lda2vec doc vectors. Either way, this is a tricky problem that I hope someone can figure out a good solution for!

I'll keep you updated if I figure something out that works in code.

nateraw avatar May 07 '18 14:05 nateraw

@nateraw thank you. Buy the way you can check https://github.com/vijeth8/lda2vec-featurizer. This version give exception when the document test is of small size of vcab.

ali3assi avatar May 07 '18 14:05 ali3assi

@TamouzeAssi No problem! Thanks for the link, I'll check it out. I actually have my own version adapted from this one working in tf 1.5+. Still looking to add more to it though.

nateraw avatar May 07 '18 18:05 nateraw

@TamouzeAssi Did you see how the repository you linked to handled out of vocabulary documents with lda2vec? It seems that is a fair way to do it!! It is in the Readme.

nateraw avatar May 07 '18 18:05 nateraw

@nateraw im trying to executing their code, but unfortunately im not able to let it execute yet. and no support from them

ali3assi avatar May 08 '18 14:05 ali3assi

@TamouzeAssi Let me upload my code for you tomorrow. If possible, I will try to replicate what they have going on before I upload it. I have my last finals for school today, so tomorrow I will be free.

I will offer support as much as possible, and will be continually updating that repository. :smile:

nateraw avatar May 08 '18 14:05 nateraw

@nateraw good luck for your exams, i will appreciate your help so much.

ali3assi avatar May 08 '18 14:05 ali3assi

@nateraw did you get the time to check please?

ali3assi avatar May 10 '18 16:05 ali3assi

@TamouzeAssi I didn't, unfortunately. I'm trying to make some user friendly changes as well as run some experiments. It needs documentation as well, so that people will understand how to interface with it.

I thought my experiment was going to take less time, but it proved to be a little tricky. I'll try to upload it within 48 hours. Sorry for the delay!!!

nateraw avatar May 10 '18 20:05 nateraw

@nateraw thanks for your cooperation!

ali3assi avatar May 10 '18 20:05 ali3assi

@TamouzeAssi I uploaded my code, check it out. I wasn't able to implement the functionality we talked about yet. :frowning_face: I'll try to get after it. Post any issues you have or functionality you want and I'll try to add it. Right now, I know there are a couple issues with reloading the model...they have to be fixed by messing with the logdir variables in the lda2vec.py file...I'll fix it ASAP.

Hope this helps, and doesnt just add additional confusion!

nateraw avatar May 11 '18 19:05 nateraw

@nateraw now i have confusion. On which code you are talking? yours or the https://github.com/vijeth8/lda2vec-featurizer.? can you send me your email

ali3assi avatar May 11 '18 19:05 ali3assi

@TamouzeAssi [email protected], email me any time.

I mean my code! Also, I fixed the restore feature.

nateraw avatar May 11 '18 19:05 nateraw

Im a little bit confused. Your code cannot generate the topic modeling for a test document. So please correct me if im wrong. You are trying to add this feature?

Which restore feature you mean?

11 May 2018 at 15:48, Nathan Raw [email protected] wrote:

@TamouzeAssi https://github.com/TamouzeAssi [email protected], email me any time.

I mean my code! Also, I fixed the restore feature.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meereeum/lda2vec-tf/issues/11#issuecomment-388467912, or mute the thread https://github.com/notifications/unsubscribe-auth/APdrK0-g2Y-7ZWPLvB5eVbcqPiqcfzB1ks5txesNgaJpZM4TxtlF .

ali3assi avatar May 11 '18 19:05 ali3assi

I was talking about model saving/restoring (the weights/etc), it was broken before but now it works.

The topic modeling for Out of Corpus documents does not work yet, we need to add it according to the way the other repository does it. I did not get time to implement it, unfortunately.

nateraw avatar May 11 '18 19:05 nateraw

There is only one other repository that talk about this feature.

ali3assi avatar May 11 '18 19:05 ali3assi

Yes! I will try to implement this feature in my version ASAP. Not exactly sure how they are doing it in this repository: lda2vec-featurizer , but I will try to figure it out and add it. You might be able to figure it out on your own by using the get_k_closest function in my version, but it would probably be extremely confusing.

If you have any issues, post them on my repository!

nateraw avatar May 11 '18 20:05 nateraw

duplicate of #1

MovGP0 avatar Jun 02 '18 06:06 MovGP0