FES icon indicating copy to clipboard operation
FES copied to clipboard

Preventing overfitting with supervised encodings: adding noise

Open alexpghayes opened this issue 6 years ago • 0 comments

Probably jumping the gun here since the overfitting chapter isn't written yet: I see a lot of Kaggle entries adding small amounts of Gaussian noise to prevent overfitting during feature engineering (in likelihood encodings for example), which feels a bit weird to me.

I'd imagine that averaged-out-of-fold predictions might be more appropriate, but that's more computationally expensive. I'd love to see a discussion comparing these two approaches and when one might be better than the other. I'd also be curious how you might select the amount of noise to add, which seems like it would require a validation set anyway.

alexpghayes avatar Jun 18 '18 18:06 alexpghayes