Common Date Transformers
First off thanks to the devs for creating such an awesome and useful library. Just a suggestion - it would be great to add a few date transformers to this library. For example pass on a list of data columns and for each column spit out separate columns year, month, weekday, hour etc. Here is a rudimentary date differ transformer I use often.
import pandas as pd
import numpy as np
import datetime as dt
from sklearn.base import TransformerMixin
class DateDiffer(TransformerMixin):
'''
# takes the difference between two dates and returns value in days
# Please use DateFormatter() before using DateDiffer()
How it works:
If you specify 3 dates: [date1,date2,date3]
Output will be 2 columns:
date2-date1
date3 - date2
The transformer takes the following parameter 'units':
Y: year
M: month
W: week
D: day
h: hour
m: minute
s: second
ms: millisecond
us: microsecond
ns: nanosecond
ps: picosecond
fs: femtosecond
as: attosecond
'''
def __init__(self, unit='D'):
self.unit = unit
def fit(self, X, y=None):
# stateless transformer
return self
def transform(self, X):
# assumes X is a DataFrame
beg_cols = X.columns[:-1]
end_cols = X.columns[1:]
Xbeg = X[beg_cols].as_matrix()
Xend = X[end_cols].as_matrix()
Xd = (Xend - Xbeg) / np.timedelta64(1, self.unit)
diff_cols = ['->'.join(pair) for pair in zip(beg_cols, end_cols)]
Xdiff = pd.DataFrame(Xd, index=X.index, columns=diff_cols)
return Xdiff
My Python foo skills are limited - for example, I am unable to generalize the DateDiffer() transformer to an entire dataframe, or say, pass it a list of columns and do a fit_transform()
Finally, is there a way to pass two numeric columns to a transformer and obtain the column differences? I know I can create interaction variables with the sklearn polynomial transformer but not df{'x1']+df['x2'] for instance.
I think this is a reasonable request, and certainly a common enough use case. @charlesdrotar let's spend some time discussing
Thanks guys - and I might add some of your classes are already solving some pain points alot of us have e..g: safelabelencoder encodes unseen values. I referenced your work in this stackoverflow thread