fastbook icon indicating copy to clipboard operation
fastbook copied to clipboard

09_tabular: wrong splitting of training/validation set

Open JohannesStutz opened this issue 5 years ago • 5 comments
trafficstars

The validation set of the bulldozer dataset is supposed to contain the six-month period from November 2011 to April 2012. However, the split condition in the book does not do this: cond = (df.saleYear<2011) | (df.saleMonth<10) This moves only the last three months of 2011 into the validation set, and January to April 2012 are used as training data. This leads to an unfairly improved performance for both the random forest and the neural network! This has been covered on the forums, but I think it has not been corrected yet in the book: https://forums.fast.ai/t/lesson-7-official-topic/69896/269 https://forums.fast.ai/t/lesson-7-official-topic/69896/273

JohannesStutz avatar Nov 03 '20 12:11 JohannesStutz

Many thanks - we'll aim to fix this if we do a 2nd edition of the book.

jph00 avatar Nov 29 '20 14:11 jph00

Is it possible to leave a short comment that the code is incorrect? I spent a lot of time on this lesson because I tried to implement everything by following the text description rather than the code. It's a pretty big error (leakage) and I get that you don't want to redo the lesson. But I think it's fair to add a short note that a mistake was made.

rajshah4 avatar Dec 28 '20 13:12 rajshah4

Makes sense - will do.

jph00 avatar Dec 28 '20 17:12 jph00

I bought a copy of the book and I'm reading through it now. I just came across that line of code and was going to create an issue for it. I'm glad someone else pointed it out!

The goal was to separate the data in the data frame between rows that occurred before November 2011 and rows that occurred later. But the dataset includes some rows from 2012 that would be incorrectly put into the training set rather than the validation set. The correct code would be something like this:

cond = (df.saleYear < 2011) | (df.saleYear == 2011 & df.saleMonth < 10)

KevinVerre avatar Mar 30 '21 05:03 KevinVerre