lda2vec-tf
lda2vec-tf copied to clipboard
get the embedding of a new document
After we run the code and get the document embedding how can we use it to predit the embedding of a new unobserved document?
I've been thinking on the same thing. Technically you cant (to my limited understanding). However, you could train another doc2vec and use something like gensim to infer the vector. Perhaps you could additionally make sure it is on the same scale as the lda2vec doc vectors. Either way, this is a tricky problem that I hope someone can figure out a good solution for!
I'll keep you updated if I figure something out that works in code.
@nateraw thank you. Buy the way you can check https://github.com/vijeth8/lda2vec-featurizer. This version give exception when the document test is of small size of vcab.
@TamouzeAssi No problem! Thanks for the link, I'll check it out. I actually have my own version adapted from this one working in tf 1.5+. Still looking to add more to it though.
@TamouzeAssi Did you see how the repository you linked to handled out of vocabulary documents with lda2vec? It seems that is a fair way to do it!! It is in the Readme.
@nateraw im trying to executing their code, but unfortunately im not able to let it execute yet. and no support from them
@TamouzeAssi Let me upload my code for you tomorrow. If possible, I will try to replicate what they have going on before I upload it. I have my last finals for school today, so tomorrow I will be free.
I will offer support as much as possible, and will be continually updating that repository. :smile:
@nateraw good luck for your exams, i will appreciate your help so much.
@nateraw did you get the time to check please?
@TamouzeAssi I didn't, unfortunately. I'm trying to make some user friendly changes as well as run some experiments. It needs documentation as well, so that people will understand how to interface with it.
I thought my experiment was going to take less time, but it proved to be a little tricky. I'll try to upload it within 48 hours. Sorry for the delay!!!
@nateraw thanks for your cooperation!
@TamouzeAssi I uploaded my code, check it out. I wasn't able to implement the functionality we talked about yet. :frowning_face: I'll try to get after it. Post any issues you have or functionality you want and I'll try to add it. Right now, I know there are a couple issues with reloading the model...they have to be fixed by messing with the logdir variables in the lda2vec.py file...I'll fix it ASAP.
Hope this helps, and doesnt just add additional confusion!
@nateraw now i have confusion. On which code you are talking? yours or the https://github.com/vijeth8/lda2vec-featurizer.? can you send me your email
@TamouzeAssi [email protected], email me any time.
I mean my code! Also, I fixed the restore feature.
Im a little bit confused. Your code cannot generate the topic modeling for a test document. So please correct me if im wrong. You are trying to add this feature?
Which restore feature you mean?
11 May 2018 at 15:48, Nathan Raw [email protected] wrote:
@TamouzeAssi https://github.com/TamouzeAssi [email protected], email me any time.
I mean my code! Also, I fixed the restore feature.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meereeum/lda2vec-tf/issues/11#issuecomment-388467912, or mute the thread https://github.com/notifications/unsubscribe-auth/APdrK0-g2Y-7ZWPLvB5eVbcqPiqcfzB1ks5txesNgaJpZM4TxtlF .
I was talking about model saving/restoring (the weights/etc), it was broken before but now it works.
The topic modeling for Out of Corpus documents does not work yet, we need to add it according to the way the other repository does it. I did not get time to implement it, unfortunately.
There is only one other repository that talk about this feature.
Yes! I will try to implement this feature in my version ASAP. Not exactly sure how they are doing it in this repository: lda2vec-featurizer , but I will try to figure it out and add it. You might be able to figure it out on your own by using the get_k_closest function in my version, but it would probably be extremely confusing.
If you have any issues, post them on my repository!
duplicate of #1