Reconfirming the evaluation metric
Think It might have already been discussed in #4 but just to reconfirm, what is the evaluation metric for this contest ? it's F1 which is 2 * (precision * recall) / (precision + recall) right ? and not accuracy which would be (sum of the diagonal of confusion matrix) / total number of test data(or blind data)
It's this:
f1_score(y_blind, y_pred, average='micro')
This is the same as the metric provided by Brendon's accuracy() function.
Thank for the info @kwinkunks . I was assuming accuracy for so long. Anybody would like to help me with a R package to calculate F1 score for multiclass ??? Highly appreciate it.
@thanish: I do not know R, but perhaps this?
https://github.com/Azure/Azure-MachineLearning-DataScience/blob/master/Utilities/R/MultiClassEvaluation/multi_class_measure.R
Thanks @mycarta that really helped :)
@kwinkunks I thought the f1 score is using average = 'weighted'.
@kwinkunks just following up on this before I have submitted and seen my score. ( Will not seem so biased!). Hopefully it´s weighted instead of micro. Micro will give bias to highly populated labels. In this case all facies are equal and there are some heavily skewed distributions of the classes.
Extracted some text below from here:
If you think there are labels with more instances than others and if you want to bias your metric towards the most populated ones, use micromedia.
If you think there are labels with more instances than others and if you want to bias your metric toward the least populated ones (or at least you don't want to bias toward the most populated ones), use macromedia.