node-fasttext icon indicating copy to clipboard operation
node-fasttext copied to clipboard

node-fasttext classification results doesn't match the original binary

Open freakeinstein opened this issue 7 years ago • 4 comments

Hi, I have tried using node-fasttext library in a project. To identify the best model parameters, I have relied on the official fasttext binary. When I use the same parameters with the node-fasttext library, I'm getting weird results. So unfortunately, I decided to use the python version of fasttext instead and it worked fine. Can you check if the node library is using the correct version of fasttext or if there any other reason causing this issue. I actually would like to use a node library in my future projects.

I would like to add a suggestion- it will be great if you consider the case of model training, model reloading and prediction multiple times with the same fasttext instance instead of new object creation for training - which currently leads to memory explosion.

freakeinstein avatar Jul 03 '18 07:07 freakeinstein

The official library has a lot update. Which version of fastext do you use? So please provide some info about parameters you used?

vunb avatar Jul 03 '18 07:07 vunb

Fasttext binary built from latest source available https://github.com/facebookresearch/fastText/commit/25d0bb04bf43d8b674fe9ae5722ef65a0856f5d6 and python library is of version - 0.8.3 as seen at https://pypi.org/project/fasttext/#history

regarding parameters, here is the data: { input: datapath+'.txt', output: datapath, epoch: 10000, lr: 0.5, lrUpdateRate: 100, wordNgrams: 2, dim: 15 } [UPDATED] I think it doesn't matter as long as the results are way too far from the official binary.

freakeinstein avatar Jul 03 '18 07:07 freakeinstein

in fasttext python module

       def check(entry):
            if entry.find('\n') != -1:
                raise ValueError(
                    "predict processes one line at a time (remove \'\\n\')"
                )
            entry += "\n"
            return entry

every query sentence end up with \n but in this module

    std::vector<PredictResult> arr;
    std::vector<int32_t> words, labels;
    std::istringstream in(sentence);

    dict_->getLine(in, words, labels);

    if (words.empty())
    {
        return arr;
    }

    Vector hidden(args_->dim);
    Vector output(dict_->nlabels());
    std::vector<std::pair<real, int32_t>> modelPredictions;
    model_->predict(words, k, 0.0001, modelPredictions, hidden, output);

not auto add \n so the result is diff. we must add \n to the end of sentence manually. that's the problem.

lzpfmh avatar May 11 '20 18:05 lzpfmh

@lzpfmh Thank you for the investigation! Please help if you can send a PR?

vunb avatar May 12 '20 07:05 vunb