GBM-Benchmarks icon indicating copy to clipboard operation
GBM-Benchmarks copied to clipboard

Cover Type training/validation/test set differs from reference

Open nslay opened this issue 3 years ago • 0 comments

When looking at your results and all papers who compare with your results, I've been scratching my head as to how you managed such high accuracy. Now I know why from looking at your code here (thank you)! You use 60% training, 20% validation and 20% testing. However, the Cover Type's info file defines the training, validation and test sets differently. And it's very hard to even manage 80% accuracy using that.

While there is nothing wrong with your evaluation, I would put a disclaimer about this somewhere.

https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/ From covtype.info:

-- Classification performance -- first 11,340 records used for training data subset -- next 3,780 records used for validation data subset -- last 565,892 records used for testing data subset -- 70% Neural Network (backpropagation) -- 58% Linear Discriminant Analysis

nslay avatar Dec 16 '21 20:12 nslay