wordvectors icon indicating copy to clipboard operation
wordvectors copied to clipboard

Build out-of-vocabulary word fom data.bin

Open binhna opened this issue 6 years ago • 4 comments

Because the advantage of subword model is that we can create the new words from pre-trained characters, I wonder how can I create a new word vector from the data.bin file. Does that .bin file contain characters and their vectors? Thanks.

binhna avatar Jun 12 '18 08:06 binhna

The .bin files are fasttext model files. They're slightly out of date, but if you apply the script from https://github.com/Kyubyong/wordvectors/issues/14 you can use the fasttext program to generate word vectors for new words.

adodge avatar Jun 24 '18 21:06 adodge

Yeah. Thank you, but I seem don't know how to use the script. I have the .bin file and your script and fasttext program, and how exactly I can apply your script to generate new words?

binhna avatar Jun 25 '18 03:06 binhna

Oh I know it now. The first and second argument in your script is the old and new .bin file respectively. After we got the new .bin file, we can use fasttext to generate a new word embedding. Thanks a lot for your script!

binhna avatar Jun 25 '18 03:06 binhna

Hi , I am using hindi language word2vec hi.bin so when i am using my corpus to find vector of word then for some number like 3740 ( ३७४० ) it give out of vocabulary. what should i do for this.

kusumlata123 avatar Jun 07 '19 03:06 kusumlata123