skip-thoughts Encoding problem in Windows Platform

I am having a problem while encoding sentences on windows platform. Is it because of the pre-trained model weights? How can I solve it?

Mar 28 '17 13:03 avhirupc

You should provide detailed information about the error you're getting, and where exactly your code fails.

On Tue, Mar 28, 2017, 6:50 AM Avhirup Chakraborty [email protected] wrote:

I am having a problem while encoding sentences on windows platform. Is it because of the pre-trained model weights? How can I solve it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ryankiros/skip-thoughts/issues/46, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRaSKHs5-8qKr8s2w5fiCZ1_0gfxFlfks5rqRAJgaJpZM4MrpiG .

Mar 28 '17 17:03 csiki

error

Thanks Avhirup, I have the same issue.

Mar 30 '17 12:03 yoloas

@csiki Do you have any idea on how to solve this query using Windows OS?

Apr 03 '17 13:04 yoloas

Have you downloaded and linked the dictionary files? It seems you have, because you're not getting any errors on that. But, the word 'pink' should be in the dictionary. Your code fails when trying to load the unknown ('UNK') word vector from the dictionary, which it cannot find either. Please, check the length and content of model['utable'] and model['btable'] after the tables are loaded - if they are empty, you don't have the correct dictionary files, or further investigation is needed.

P.S. I have not tried skip-thought on a Windows system.

Apr 04 '17 02:04 csiki

error1

length

@csiki These are the results I am getting. Since it is a big file so, I can't print the whole content at once. So, I tried doing model['utable'][0] but it is showing a Keyerror : 0. What should be done now?

Apr 04 '17 11:04 yoloas

You're getting Keyerror : 0, because model['utable'] and the btable counterpart are dictionaries. Basically the key is the word, the value is the word vector - and there is no word like 0. Those \x00 stuff you see everywhere are ASCII characters defined in hex. So basically, all your keys are in UTF-16 encoding. Probably you miss that encoding information on Windows, and it tries to add some padding to the characters - that's my explanation, but it's pretty superficial. You have 2 ways to solve this:

Before you provide a string to the encode() function (your sentence), encode it in UTF-16, then take the first 2 characters out (they are added when you turn your ASCII encoding to UTF-16); something like this:

# init model ...
sentence = 'pink flower'
x = sentence.encode('utf-16')[2:]  # turns sentence to UTF-16 taking out the last 2 characters
vector = skipthought_aeny.encode(model, x)

Load the model, run through all the keys, and decode them from UTF-16; swap the original keys with the decoded ones. You can use the string.decode('utf-16'), similarly to encode. This requires a bit more work. If you have hard time getting it done, just write another comment here.

Apr 05 '17 04:04 csiki

@csiki can you help me with value error i am getting? here i am attaching screenshot screenshot from 2018-02-04 15 22 03

Thank you

Feb 04 '18 20:02 Pratyusha1796

skip-thoughts skip-thoughts copied to clipboard

Encoding problem in Windows Platform

skip-thoughts
skip-thoughts copied to clipboard