skutil icon indicating copy to clipboard operation
skutil copied to clipboard

Common Date Transformers

Open jmackwinn opened this issue 8 years ago • 2 comments

First off thanks to the devs for creating such an awesome and useful library. Just a suggestion - it would be great to add a few date transformers to this library. For example pass on a list of data columns and for each column spit out separate columns year, month, weekday, hour etc. Here is a rudimentary date differ transformer I use often.

import pandas as pd
import numpy as np
import datetime as dt
from sklearn.base import TransformerMixin

class DateDiffer(TransformerMixin):
    '''
    # takes the difference between two dates and returns value in days
    # Please use DateFormatter() before using DateDiffer()
    
    How it works:
    If you specify 3 dates: [date1,date2,date3]
    Output will be 2 columns:
        date2-date1
        date3 - date2
    
    The transformer takes the following parameter 'units':
        Y:  year	
        M:  month	
        W:  week	
        D:  day		
        h:  hour	
        m:  minute	
        s:  second	
        ms: millisecond	
        us: microsecond	
        ns: nanosecond	
        ps: picosecond	
        fs: femtosecond	
        as: attosecond	
    '''
    def __init__(self, unit='D'):
        self.unit = unit
    
    def fit(self, X, y=None):
        # stateless transformer
        return self

    def transform(self, X):
        # assumes X is a DataFrame
        beg_cols = X.columns[:-1]
        end_cols = X.columns[1:]
        Xbeg = X[beg_cols].as_matrix()
        Xend = X[end_cols].as_matrix()
        Xd = (Xend - Xbeg) / np.timedelta64(1, self.unit)
        diff_cols = ['->'.join(pair) for pair in zip(beg_cols, end_cols)]
        Xdiff = pd.DataFrame(Xd, index=X.index, columns=diff_cols)
        return Xdiff


My Python foo skills are limited - for example, I am unable to generalize the DateDiffer() transformer to an entire dataframe, or say, pass it a list of columns and do a fit_transform()

Finally, is there a way to pass two numeric columns to a transformer and obtain the column differences? I know I can create interaction variables with the sklearn polynomial transformer but not df{'x1']+df['x2'] for instance.

jmackwinn avatar Oct 21 '17 01:10 jmackwinn

I think this is a reasonable request, and certainly a common enough use case. @charlesdrotar let's spend some time discussing

tgsmith61591 avatar Oct 22 '17 13:10 tgsmith61591

Thanks guys - and I might add some of your classes are already solving some pain points alot of us have e..g: safelabelencoder encodes unseen values. I referenced your work in this stackoverflow thread

jmackwinn avatar Oct 22 '17 17:10 jmackwinn