node-word2vec
node-word2vec copied to clipboard
`mostSimilar` outputs numbers when using Fasttext word vectors
Hi,
First of all, thanks for the awesome work!
I am trying to import the pre-trained files from the fasttext repo: https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
The model loads without a problem; however, when I try mostSimilar
, the most similar words appear to be numbers:
loadedModel.mostSimilar('hi')
> [ { word: '73301', dist: 0.4461598818767161 },
{ word: '266', dist: 0.44462500361860946 },
{ word: '399', dist: 0.44260747560473973 },
{ word: '-0.13061', dist: 0.4250619904094889 },
{ word: '745', dist: 0.4089746546859616 },
{ word: '7', dist: 0.39388342200258686 },
{ word: '233', dist: 0.38675386429631425 },
{ word: '.33347', dist: 0.38672456155896373 },
{ word: '999', dist: 0.3798941950492955 },
{ word: '.5158', dist: 0.3761412428047805 },
{ word: '4785', dist: 0.3756878374324986 },
{ word: '', dist: 0.3753017613199615 },
{ word: '4091', dist: 0.3728785618174816 },
{ word: '0.18393', dist: 0.3702285209309231 },
{ word: '5', dist: 0.3694416515730196 },
{ word: '', dist: 0.3682340927295216 },
{ word: '2', dist: 0.3682152969462404 },
{ word: '68', dist: 0.36721353813091373 },
{ word: '10285', dist: 0.36564681449501635 },
{ word: '', dist: 0.36526450978156066 },
{ word: '014575', dist: 0.36389461240841203 },
{ word: '468', dist: 0.36371019302454455 },
{ word: '-0.00046764', dist: 0.3637013226972051 },
{ word: '.012665', dist: 0.36367885124101007 },
{ word: '142', dist: 0.3636392745394945 },
{ word: '574', dist: 0.36060934864973193 },
{ word: '0.6865', dist: 0.3602319353978014 },
{ word: '91', dist: 0.357913584485305 },
{ word: '53', dist: 0.35790250493633724 },
{ word: '925', dist: 0.3576282053138198 },
{ word: '1942', dist: 0.35588944804722655 },
{ word: '', dist: 0.3558833583782604 },
{ word: '3', dist: 0.3546257354328858 },
{ word: '-0.059739', dist: 0.3546232535404894 },
{ word: '', dist: 0.35400407472165496 },
{ word: '08', dist: 0.3536348589615367 },
{ word: '093', dist: 0.35353088901048624 },
{ word: '0.11736', dist: 0.3529077373455495 },
{ word: '.12359', dist: 0.3511316591255266 },
{ word: '10224', dist: 0.35079793819829935 } ]
I also tried hello
it says it is out of the dictionary. How can I import the Fasttext files so that this won't happen?
Hello,
I faced a similar issue when using another pre-trained file. The problem was that loadModel
read the model file as a binary file although it's actually a plain text.
loadModel
distinguishes whether the model file is binary using mime.lookup(file)
. I fixed the problem by changing the extension of the model file from .bin
to .txt
.
Thanks a lot @pizzacat83 for sharing your solution. I'll give it a try as soon as I can.