skip-thoughts
skip-thoughts copied to clipboard
Encoding problem in Windows Platform
I am having a problem while encoding sentences on windows platform. Is it because of the pre-trained model weights? How can I solve it?
You should provide detailed information about the error you're getting, and where exactly your code fails.
On Tue, Mar 28, 2017, 6:50 AM Avhirup Chakraborty [email protected] wrote:
I am having a problem while encoding sentences on windows platform. Is it because of the pre-trained model weights? How can I solve it?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ryankiros/skip-thoughts/issues/46, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRaSKHs5-8qKr8s2w5fiCZ1_0gfxFlfks5rqRAJgaJpZM4MrpiG .
Thanks Avhirup, I have the same issue.
@csiki Do you have any idea on how to solve this query using Windows OS?
Have you downloaded and linked the dictionary files? It seems you have, because you're not getting any errors on that. But, the word 'pink' should be in the dictionary. Your code fails when trying to load the unknown ('UNK') word vector from the dictionary, which it cannot find either. Please, check the length and content of model['utable'] and model['btable'] after the tables are loaded - if they are empty, you don't have the correct dictionary files, or further investigation is needed.
P.S. I have not tried skip-thought on a Windows system.
@csiki These are the results I am getting. Since it is a big file so, I can't print the whole content at once. So, I tried doing model['utable'][0] but it is showing a Keyerror : 0. What should be done now?
You're getting Keyerror : 0, because model['utable'] and the btable counterpart are dictionaries. Basically the key is the word, the value is the word vector - and there is no word like 0
.
Those \x00 stuff you see everywhere are ASCII characters defined in hex. So basically, all your keys are in UTF-16 encoding. Probably you miss that encoding information on Windows, and it tries to add some padding to the characters - that's my explanation, but it's pretty superficial.
You have 2 ways to solve this:
- Before you provide a string to the encode() function (your sentence), encode it in UTF-16, then take the first 2 characters out (they are added when you turn your ASCII encoding to UTF-16); something like this:
# init model ...
sentence = 'pink flower'
x = sentence.encode('utf-16')[2:] # turns sentence to UTF-16 taking out the last 2 characters
vector = skipthought_aeny.encode(model, x)
- Load the model, run through all the keys, and decode them from UTF-16; swap the original keys with the decoded ones. You can use the
string.decode('utf-16')
, similarly to encode. This requires a bit more work. If you have hard time getting it done, just write another comment here.
@csiki can you help me with value error i am getting?
here i am attaching screenshot
Thank you