machine-learning-articles Data Transformation and Feature Engineering

Data Transformation and Feature Engineering

Open UKVeteran opened this issue 3 years ago • 1 comments

TL;DR

Article Link

https://towardsdatascience.com/data-transformation-and-feature-engineering-e3c7dfbb4899

Author

Destin Gong

Key Takeaways

Why need data transformation?

the algorithm is more likely to be biased when the data distribution is skewed
transforming data into the same scale allows the algorithm to compare the relative relationship between data points better

Useful Code Snippets


## data scaling methods ##
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import RobustScaler
scale_var = ['Enrollment_Length', 'Recency', 'NumStorePurchases', 'clipped_Age', 'clipped_NumWebVisitsMonth']
scalers_list = [StandardScaler(), RobustScaler(), MinMaxScaler()]
for i in range(len(scalers_list)):
    scaler = scalers_list[i]
    fig = plt.figure(figsize = (26, 5))
    plt.title(scaler, fontsize = 20)
    for j in range(len(scale_var)):
        var = scale_var[j]
        scaled_var = "scaled_" + var
        model = scaler.fit(df[var].values.reshape(-1,1))
        df[scaled_var] = model.transform(df[var].values.reshape(-1, 1))
sub = fig.add_subplot(1, 5, j + 1)
        sub.set_xlabel(var)
        df[scaled_var].plot(kind = 'hist')

Useful Tools

Comments/ Questions

Aug 01 '21 13:08 UKVeteran

Thank you for contributing! This looks very useful!

Aug 02 '21 14:08 khuyentran1401

machine-learning-articles machine-learning-articles copied to clipboard

Data Transformation and Feature Engineering

TL;DR

Article Link

Author

Key Takeaways

Useful Code Snippets

Useful Tools

Comments/ Questions

machine-learning-articles
machine-learning-articles copied to clipboard