wordvectors
wordvectors copied to clipboard
Build out-of-vocabulary word fom data.bin
Because the advantage of subword model is that we can create the new words from pre-trained characters, I wonder how can I create a new word vector from the data.bin file. Does that .bin file contain characters and their vectors? Thanks.
The .bin
files are fasttext model files. They're slightly out of date, but if you apply the script from https://github.com/Kyubyong/wordvectors/issues/14 you can use the fasttext
program to generate word vectors for new words.
Yeah. Thank you, but I seem don't know how to use the script. I have the .bin
file and your script and fasttext
program, and how exactly I can apply your script to generate new words?
Oh I know it now. The first and second argument in your script is the old and new .bin
file respectively. After we got the new .bin
file, we can use fasttext
to generate a new word embedding.
Thanks a lot for your script!
Hi , I am using hindi language word2vec hi.bin so when i am using my corpus to find vector of word then for some number like 3740 ( ३७४० ) it give out of vocabulary. what should i do for this.