tales-science-data icon indicating copy to clipboard operation
tales-science-data copied to clipboard

Feature normalisation

Open martinapugliese opened this issue 7 years ago • 1 comments

To do in regression. subtract mean, divide by std.

Watch out for test set: don't want to use on its mean and std but those of the training set (stuff from the test set should never be seen by the training phase). Way to do in sklearn is

from sklearn import preprocessing

std_scale = preprocessing.StandardScaler().fit(X_train)
X_train_std = std_scale.transform(X_train)
X_test_std = std_scale.transform(X_test)

martinapugliese avatar Feb 15 '18 15:02 martinapugliese

Also a read over http://sebastianraschka.com/Articles/2014_about_feature_scaling.html#dividing-the-dataset-into-a-separate-training-and-test-dataset

martinapugliese avatar Feb 15 '18 15:02 martinapugliese