pydeeplearn icon indicating copy to clipboard operation
pydeeplearn copied to clipboard

test set normalization based on training set

Open snurkabill opened this issue 9 years ago • 4 comments

snurkabill avatar Sep 11 '15 18:09 snurkabill

Hi! Thanks for the contribution!

Your commit and pull request say 'test set normalization based on training set', but I do not see the gaussianNormalization function that takes two parameters ever used.

Also overloading does not work like this in python: you cannot define two functions wit the same name, you have to use default arguments, please see a discussion here.

Did you try running the with your change? Do you see any improvements?

mihaelacr avatar Sep 15 '15 19:09 mihaelacr

Hi,

I must admit that I didn't run those changes. I have some local changes that works and I've just tried to put it together.

first of all I just wanted discuss those changes , I will make proper PR later. my point right here is, that test set's normalization should be based on atributes gained on training set.

normalization itself is done in normalizeData function

snurkabill avatar Sep 15 '15 20:09 snurkabill

Btw, I suspect that all of my PR's won't work for the first time. I really want to know your opinion :)

snurkabill avatar Sep 15 '15 20:09 snurkabill

ok, scale() is renamed as it was before.

Motivation for scaling testing data based on training set:

  • when data is online, we can't normalize testing set on it's parameters because we don't have all data yet, we need to use already measured params
  • model is based on some noramlized data, we should also use that normalization for testing data... if testing data are somehow different (shifted atc.) we have wrongly sampled training set.

snurkabill avatar Sep 27 '15 18:09 snurkabill