node-cld icon indicating copy to clipboard operation
node-cld copied to clipboard

Expose predicted probabilities

Open loretoparisi opened this issue 5 years ago • 2 comments

An internal function of CLD2, int GetLangScore(uint32 probs, uint8 pslang) let you get the score of a language by language probability. This is used in several points to calculate the score of a language, given the top 3 languages predicted like here:

if (indirect < static_cast<int>(obj->kCLDTableSizeOne)) {
    // Up to three languages at indirect
    uint32 langprob = obj->kCLDTableInd[indirect];
    return GetLangScore(langprob, lang1) - GetLangScore(langprob, lang2);
  }

I would like to get the opposite, hence to get the probabilities for each language, referred in the code as langprob.

There are some internal testing function like string GetLangProbTxt(const ScoringContext* scoringcontext, uint32 langprob) that seems to print out these probabilities here, so in theory it should be easier as

uint32 langprob = base_obj->kCLDTableInd[indirect];
retval.append(GetLangProbTxt(scoringcontext, langprob));

loretoparisi avatar Mar 13 '19 02:03 loretoparisi

Any news on this?

loretoparisi avatar Jun 25 '20 16:06 loretoparisi

I'll take a look when I have the time but it will probably be months before I get to it.

dachev avatar Jul 20 '20 13:07 dachev