tidycode icon indicating copy to clipboard operation
tidycode copied to clipboard

Meaning of the "score" column

Open qihan-z opened this issue 3 years ago • 1 comments

Hi Dr.McGowan,

I'm trying to understand the score column of classification_tbl.csv file, but I couldn't find any documentation of the meaning of the variable and its role. I'd really appreciate it if you can explain this variable or point me towards where I can find information on this column. Thank you

qihan-z avatar Feb 16 '22 04:02 qihan-z

The score is the prevalence of the given classification for the function in question (depending on the “lexicon” which in this case is either members of the Leek Lab who classified several R functions or “crowd source” where we allowed anyone to classify R functions using a application we developed). For example, if the function is library, the classification is setup, the lexicon was “crowdsource”, and the score is 0.67, that means 67% of the crowdsourced participants classified the library function as setup.

this paper has a few examples: https://journal.r-project.org/archive/2020/RJ-2020-011/RJ-2020-011.pdf

LucyMcGowan avatar Feb 16 '22 12:02 LucyMcGowan