pydeeplearn
pydeeplearn copied to clipboard
test set normalization based on training set
Hi! Thanks for the contribution!
Your commit and pull request say 'test set normalization based on training set', but I do not see the gaussianNormalization function that takes two parameters ever used.
Also overloading does not work like this in python: you cannot define two functions wit the same name, you have to use default arguments, please see a discussion here.
Did you try running the with your change? Do you see any improvements?
Hi,
I must admit that I didn't run those changes. I have some local changes that works and I've just tried to put it together.
first of all I just wanted discuss those changes , I will make proper PR later. my point right here is, that test set's normalization should be based on atributes gained on training set.
normalization itself is done in normalizeData function
Btw, I suspect that all of my PR's won't work for the first time. I really want to know your opinion :)
ok, scale() is renamed as it was before.
Motivation for scaling testing data based on training set:
- when data is online, we can't normalize testing set on it's parameters because we don't have all data yet, we need to use already measured params
- model is based on some noramlized data, we should also use that normalization for testing data... if testing data are somehow different (shifted atc.) we have wrongly sampled training set.