machine-learning-articles icon indicating copy to clipboard operation
machine-learning-articles copied to clipboard

Data Transformation and Feature Engineering

Open UKVeteran opened this issue 3 years ago • 1 comments

TL;DR

Article Link

https://towardsdatascience.com/data-transformation-and-feature-engineering-e3c7dfbb4899

Author

Destin Gong

Key Takeaways

Why need data transformation?

  • the algorithm is more likely to be biased when the data distribution is skewed
  • transforming data into the same scale allows the algorithm to compare the relative relationship between data points better

Useful Code Snippets


## data scaling methods ##
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import RobustScaler
scale_var = ['Enrollment_Length', 'Recency', 'NumStorePurchases', 'clipped_Age', 'clipped_NumWebVisitsMonth']
scalers_list = [StandardScaler(), RobustScaler(), MinMaxScaler()]
for i in range(len(scalers_list)):
    scaler = scalers_list[i]
    fig = plt.figure(figsize = (26, 5))
    plt.title(scaler, fontsize = 20)
    for j in range(len(scale_var)):
        var = scale_var[j]
        scaled_var = "scaled_" + var
        model = scaler.fit(df[var].values.reshape(-1,1))
        df[scaled_var] = model.transform(df[var].values.reshape(-1, 1))
sub = fig.add_subplot(1, 5, j + 1)
        sub.set_xlabel(var)
        df[scaled_var].plot(kind = 'hist')

Useful Tools

Comments/ Questions

UKVeteran avatar Aug 01 '21 13:08 UKVeteran

Thank you for contributing! This looks very useful!

khuyentran1401 avatar Aug 02 '21 14:08 khuyentran1401