wordvectors issues

korean language

3

I use it with the korean language in gensim 4.0.x. thus I used KeyedVectors.load('ko.bin') and KeyedVectors.load_word2vec_format('ko.bin'), but there was an error 'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position...

trungluu91

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

19

Hi, I am trying to load Chinese pretrained word2vec, word_vectors = KeyedVectors.load_word2vec_format(path, binary=True) # C binary format it throws this error.

liwzhi

Training specification for pretrained model

Hello, First of all, thank you for the pre-trained model. Since there are many ways to train a fasttext model for Korean, I am curious about how you trained your...

maxmarketit

Build out-of-vocabulary word fom data.bin

4

Because the advantage of subword model is that we can create the new words from pre-trained characters, I wonder how can I create a new word vector from the data.bin...

binhna

Loading embeddings

3

Hi, I downloaded the French embeddings, and extracted the zip file. How can I load these embeddings in a python code and return the embeddings for a specified word, e.g.:...

Joseph94m

Details on word2vec model

1

Dear Kyubyong, great work - thank you very much for proving these word vectors! One question: Which model did you use to train your word vectors with word2vec? Skip-gram or...

PhilKuhnke

fasttext file format seems wrong

2

Thank you very much for this project. It seems very useful. I don't seem to be able to use the fasttext files, at least not the Russian or Turkish ones....

adodge

wordvectors
wordvectors copied to clipboard

Metadata

korean language

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

Training specification for pretrained model

Build out-of-vocabulary word fom data.bin

Loading embeddings

Details on word2vec model

fasttext file format seems wrong

Dictionary for Japanese morpheme analyzer is not mentioned

Using better link

Calling word embeddings "models" is a bit misleading.

← Metadata

Owner

Metadata

wordvectors wordvectors copied to clipboard

Metadata

← Metadata

Owner

Metadata

wordvectors
wordvectors copied to clipboard