traceml
traceml copied to clipboard
Why not just build this directly into pandas?
Alternatively, make the library an extension of pandas.core.generic.NDFrame, and then these statistics can be ran against Series as well as DataFrames, and they don't have to be instantiated as a separate object. This would allow you to do an import and use it directly on a DataFrame:
import pandas as pd
from pandas.core.generic import NDFrame
def monkeypatch_method(cls):
def decorator(func):
setattr(cls, func.__name__, func)
return func
return decorator
@monkeypatch_method(NDFrame)
def summary(self):
"""Does the same as dfs = DataFrameSummary(self); dfs[column]"""
dfs = DataFrameSummary(self)
dfs.__getitem__(self,column)
Then it could be used like this:
import pandas as pd
import pandas_summary
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
df['num_legs'].summary()
instead of:
import pandas as pd
from pandas_summary import DataFrameSummary
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
dfs = DataFrameSummary(df)
dfs['num_legs']
The above code is untested, but generally correct with how it'd be exposed and used. I think this makes more sense than using properties and is more like DataFrame.describe() in how it is called, but providing the additional information DataFrameSummary class provides... the summary. It's just more tightly integrated.
Developed as an extension library like this, it could be brought to Pandas in the future more easily.
@rbeesley this is an interesting idea.