midi-classification-tutorial icon indicating copy to clipboard operation
midi-classification-tutorial copied to clipboard

Accuracy is dependent on label distribution!

Open Kristopher38 opened this issue 2 years ago • 0 comments

The results aren't any better than randomly guessing "Pop_Rock" every time, after matching MIDIs with genres the genre distribution on matched files looks like this:

Pop_Rock         9866
Country          1059
Electronic        783
RnB               423
Latin             303
Jazz              282
New Age           230
Rap               121
International      86
Reggae             70
Folk               64
Vocal              41
Blues              32

With a total of 13360 samples, you're getting ~75% accuracy, while the ratio of "Pop_Rock" in the whole set is 73.8%. If you have a look at a confusion matrix e.g. for SVM, you can see that the classifier actually does learn to answer "Pop_Rock" every time!

Confusion matrix for some small sample size:
[[  0   0   0   0   0  15   0   0   0   0]
 [  0   0   0   0   0   1   0   0   0   0]
 [  0   0   0   0   0  11   0   0   0   0]
 [  0   0   0   0   0   9   0   0   0   0]
 [  0   0   0   0   0   1   0   0   0   0]
 [  0   0   0   0   0 146   0   0   0   0]
 [  0   0   0   0   0   1   0   0   0   0]
 [  0   0   0   0   0   3   0   0   0   0]
 [  0   0   0   0   0   6   0   0   0   0]
 [  0   0   0   0   0   6   0   0   0   0]]

So your set of extracted features doesn't provide any valuable info to the classifier.

Kristopher38 avatar Feb 13 '23 19:02 Kristopher38