tales-science-data
tales-science-data copied to clipboard
Feature normalisation
To do in regression. subtract mean, divide by std.
Watch out for test set: don't want to use on its mean and std but those of the training set (stuff from the test set should never be seen by the training phase). Way to do in sklearn is
from sklearn import preprocessing
std_scale = preprocessing.StandardScaler().fit(X_train)
X_train_std = std_scale.transform(X_train)
X_test_std = std_scale.transform(X_test)
Also a read over http://sebastianraschka.com/Articles/2014_about_feature_scaling.html#dividing-the-dataset-into-a-separate-training-and-test-dataset