handson-ml3
handson-ml3 copied to clipboard
[QUESTION] Why all estimators should be fitted to training data only?
Page 45 says:
As will all estimators, it is important to fit scalers to the training data only: never use
fit()
orfit_transform()
for anything else than training set.
Could you please explain why it is important and what happens if this recommendation is not followed?
Hi @vasili111, that's a common and really useful question to ask. When I get asked this, I usually point folks to this Stack Overflow answer, to get an insightful explanation:
https://stackoverflow.com/questions/48692500/fit-transform-on-training-data-and-transform-on-test-data