node-fasttext node-fasttext classification results doesn't match the original binary

Hi, I have tried using node-fasttext library in a project. To identify the best model parameters, I have relied on the official fasttext binary. When I use the same parameters with the node-fasttext library, I'm getting weird results. So unfortunately, I decided to use the python version of fasttext instead and it worked fine. Can you check if the node library is using the correct version of fasttext or if there any other reason causing this issue. I actually would like to use a node library in my future projects.

I would like to add a suggestion- it will be great if you consider the case of model training, model reloading and prediction multiple times with the same fasttext instance instead of new object creation for training - which currently leads to memory explosion.

Jul 03 '18 07:07 freakeinstein

The official library has a lot update. Which version of fastext do you use? So please provide some info about parameters you used?

Jul 03 '18 07:07 vunb

Fasttext binary built from latest source available https://github.com/facebookresearch/fastText/commit/25d0bb04bf43d8b674fe9ae5722ef65a0856f5d6 and python library is of version - 0.8.3 as seen at https://pypi.org/project/fasttext/#history

regarding parameters, here is the data: { input: datapath+'.txt', output: datapath, epoch: 10000, lr: 0.5, lrUpdateRate: 100, wordNgrams: 2, dim: 15 } [UPDATED] I think it doesn't matter as long as the results are way too far from the official binary.

Jul 03 '18 07:07 freakeinstein

in fasttext python module

       def check(entry):
            if entry.find('\n') != -1:
                raise ValueError(
                    "predict processes one line at a time (remove \'\\n\')"
                )
            entry += "\n"
            return entry

every query sentence end up with \n but in this module

    std::vector<PredictResult> arr;
    std::vector<int32_t> words, labels;
    std::istringstream in(sentence);

    dict_->getLine(in, words, labels);

    if (words.empty())
    {
        return arr;
    }

    Vector hidden(args_->dim);
    Vector output(dict_->nlabels());
    std::vector<std::pair<real, int32_t>> modelPredictions;
    model_->predict(words, k, 0.0001, modelPredictions, hidden, output);

not auto add \n so the result is diff. we must add \n to the end of sentence manually. that's the problem.

May 11 '20 18:05 lzpfmh

@lzpfmh Thank you for the investigation! Please help if you can send a PR?

May 12 '20 07:05 vunb