node-cld
node-cld copied to clipboard
Expose predicted probabilities
An internal function of CLD2, int GetLangScore(uint32 probs, uint8 pslang) let you get the score of a language by language probability. This is used in several points to calculate the score of a language, given the top 3 languages predicted like here:
if (indirect < static_cast<int>(obj->kCLDTableSizeOne)) {
// Up to three languages at indirect
uint32 langprob = obj->kCLDTableInd[indirect];
return GetLangScore(langprob, lang1) - GetLangScore(langprob, lang2);
}
I would like to get the opposite, hence to get the probabilities for each language, referred in the code as langprob
.
There are some internal testing function like string GetLangProbTxt(const ScoringContext* scoringcontext, uint32 langprob)
that seems to print out these probabilities here, so in theory it should be easier as
uint32 langprob = base_obj->kCLDTableInd[indirect];
retval.append(GetLangProbTxt(scoringcontext, langprob));
Any news on this?
I'll take a look when I have the time but it will probably be months before I get to it.