Word2Vec.jl Reading a binary file throws an error as reading from unicode is not handled.

When requested to read from a binary, which has unicode, it results in ERROR: UnicodeError: invalid character index.

To reproduce, load the test file from Google https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing

Feb 25 '17 18:02 jayend-manika

encoding attribute is there in the python version. That may not be exposed. Need to check.

Aug 22 '17 01:08 sambitdash

I think there is a different reason for this. The original google-files seem to have a slightly different format and the parser for the binary file reads one byte too far.

Removing the read(f, UInt8) # new line here solves the issue (but presumably, the files created with this package can't be loaded in this case anymore)

I solved it by including the additional loading option :google to the existing :text and :binary where this read is removed.

Jun 07 '18 12:06 Paethon

PR #8 fixes this

Sep 18 '18 09:09 Paethon

I think there is a different reason for this. The original google-files seem to have a slightly different format and the parser for the binary file reads one byte too far.

Removing the read(f, UInt8) # new line here solves the issue (but presumably, the files created with this package can't be loaded in this case anymore)

I solved it by including the additional loading option :google to the existing :text and :binary where this read is removed.

would you please write the code of what you are saying I got confused honestly ,

Mar 18 '19 15:03 alabrashJr

so I did the implementation by my self, and I sharing it with you,

https://gist.github.com/alabrashJr/d71cf74bc9713bb0a5bb12ccd331a405

Mar 21 '19 13:03 alabrashJr