machine-learning-articles
machine-learning-articles copied to clipboard
Data Transformation and Feature Engineering
TL;DR
Article Link
https://towardsdatascience.com/data-transformation-and-feature-engineering-e3c7dfbb4899
Author
Destin Gong
Key Takeaways
Why need data transformation?
- the algorithm is more likely to be biased when the data distribution is skewed
- transforming data into the same scale allows the algorithm to compare the relative relationship between data points better
Useful Code Snippets
## data scaling methods ##
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import RobustScaler
scale_var = ['Enrollment_Length', 'Recency', 'NumStorePurchases', 'clipped_Age', 'clipped_NumWebVisitsMonth']
scalers_list = [StandardScaler(), RobustScaler(), MinMaxScaler()]
for i in range(len(scalers_list)):
scaler = scalers_list[i]
fig = plt.figure(figsize = (26, 5))
plt.title(scaler, fontsize = 20)
for j in range(len(scale_var)):
var = scale_var[j]
scaled_var = "scaled_" + var
model = scaler.fit(df[var].values.reshape(-1,1))
df[scaled_var] = model.transform(df[var].values.reshape(-1, 1))
sub = fig.add_subplot(1, 5, j + 1)
sub.set_xlabel(var)
df[scaled_var].plot(kind = 'hist')
Useful Tools
Comments/ Questions
Thank you for contributing! This looks very useful!