spacy-cld icon indicating copy to clipboard operation
spacy-cld copied to clipboard

Interpretation of score

Open Natalie-Caruana opened this issue 2 years ago • 0 comments

Hi, pycld2 detect function with returnVectors set to False returns four arguments. As I understand, (assuming one language detected) the confidence score of spacy-cld is calculated by dividing the third value in the third argument returned by pycld2, by 100 i.e.

reliable,textBytesFound,details,vectors=cld2.detect(text)

spacy_score = details[0][2]/100

However in pycld2's detect function documentation the third argument details is explained as follows:

details: tuple Tuple of up to three detected languages, where each is tuple is (languageName, languageCode, percent, score). percent is what percentage of the original text was detected as this language and score is the confidence score for that language. So if percent means the percentage of the original text detected, then this is not related to how good the prediction was. Shouldn't some form of normalization be done on the fourth argument score instead?

Thanks

Natalie-Caruana avatar Apr 29 '22 15:04 Natalie-Caruana