JFastText
JFastText copied to clipboard
the prediction is not same as predicted using official c++
I just tested this repo and the official one to predict a number of samples(with the same model trained by official code, in format of ftz).
c++
fasttext predict-prob test-example.txt
java(this) - api call
(equal represents the label is same, discard the probability) all samples: 21513 equal: 19236 not-equal: 2219 null(in this repo): 58
java-cmd(this)
java -jar jfasttext-0.4-jar-with-dependencies.jar predict-prob test-example.txt
all samples: 21513
equal: 18825
not-equal: 2688
so, what's wrong?
Another thing: the prediction of java-cmd
is unstable , changing every time.
Found this one: https://github.com/linkfluence/fastText4j, the prediction is quite same.
Based on what @carschno mentioned in https://github.com/vinhkhuc/JFastText/issues/49, I used this to get the right results:
public Map<String, Double> predictTopLabel(String text, int k) {
Map<String, Double> scoreMap = new LinkedHashMap<>();
text = StringUtils.trimToEmpty(text) + "\n";
final List<JFastText.ProbLabel> pl = model.predictProba(text, k);
for (JFastText.ProbLabel i : CollectionUtils.emptyIfNull(pl)) {
final double prob = Math.exp(i.logProb);
final double score = Math.round(prob * 100000000) / 100000000;
scoreMap.put(i.label, score);
}
return scoreMap;
}