node-fasttext classification results doesn't match the original binary
Hi, I have tried using node-fasttext library in a project. To identify the best model parameters, I have relied on the official fasttext binary. When I use the same parameters with the node-fasttext library, I'm getting weird results. So unfortunately, I decided to use the python version of fasttext instead and it worked fine. Can you check if the node library is using the correct version of fasttext or if there any other reason causing this issue. I actually would like to use a node library in my future projects.
I would like to add a suggestion- it will be great if you consider the case of model training, model reloading and prediction multiple times with the same fasttext instance instead of new object creation for training - which currently leads to memory explosion.
The official library has a lot update. Which version of fastext do you use? So please provide some info about parameters you used?
Fasttext binary built from latest source available https://github.com/facebookresearch/fastText/commit/25d0bb04bf43d8b674fe9ae5722ef65a0856f5d6 and python library is of version - 0.8.3 as seen at https://pypi.org/project/fasttext/#history
regarding parameters, here is the data: { input: datapath+'.txt', output: datapath, epoch: 10000, lr: 0.5, lrUpdateRate: 100, wordNgrams: 2, dim: 15 } [UPDATED] I think it doesn't matter as long as the results are way too far from the official binary.
in fasttext python module
def check(entry):
if entry.find('\n') != -1:
raise ValueError(
"predict processes one line at a time (remove \'\\n\')"
)
entry += "\n"
return entry
every query sentence end up with \n but in this module
std::vector<PredictResult> arr;
std::vector<int32_t> words, labels;
std::istringstream in(sentence);
dict_->getLine(in, words, labels);
if (words.empty())
{
return arr;
}
Vector hidden(args_->dim);
Vector output(dict_->nlabels());
std::vector<std::pair<real, int32_t>> modelPredictions;
model_->predict(words, k, 0.0001, modelPredictions, hidden, output);
not auto add \n so the result is diff. we must add \n to the end of sentence manually. that's the problem.
@lzpfmh Thank you for the investigation! Please help if you can send a PR?