2016-ml-contest Reconfirming the evaluation metric

Think It might have already been discussed in #4 but just to reconfirm, what is the evaluation metric for this contest ? it's F1 which is 2 * (precision * recall) / (precision + recall) right ? and not accuracy which would be (sum of the diagonal of confusion matrix) / total number of test data(or blind data)

Dec 16 '16 17:12 thanish

It's this:

f1_score(y_blind, y_pred, average='micro')

Read about it here.

This is the same as the metric provided by Brendon's accuracy() function.

Dec 16 '16 20:12 kwinkunks

Thank for the info @kwinkunks . I was assuming accuracy for so long. Anybody would like to help me with a R package to calculate F1 score for multiclass ??? Highly appreciate it.

Dec 18 '16 06:12 thanish

@thanish: I do not know R, but perhaps this?

https://github.com/Azure/Azure-MachineLearning-DataScience/blob/master/Utilities/R/MultiClassEvaluation/multi_class_measure.R

Dec 20 '16 00:12 mycarta

Thanks @mycarta that really helped :)

Dec 20 '16 13:12 thanish

@kwinkunks I thought the f1 score is using average = 'weighted'.

Dec 24 '16 20:12 dalide

@kwinkunks just following up on this before I have submitted and seen my score. ( Will not seem so biased!). Hopefully it´s weighted instead of micro. Micro will give bias to highly populated labels. In this case all facies are equal and there are some heavily skewed distributions of the classes.

Extracted some text below from here:

If you think there are labels with more instances than others and if you want to bias your metric towards the most populated ones, use micromedia.

If you think there are labels with more instances than others and if you want to bias your metric toward the least populated ones (or at least you don't want to bias toward the most populated ones), use macromedia.

Jan 24 '17 00:01 AdmcCarthy