Unexpected TypeError using `pd.NamedAgg` dict with `pd.rolling`
Code Sample
import pandas as pd
animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
'height': [9.1, 6.0, 9.5, 34.0],
'weight': [7.9, 7.5, 9.9, 198.0]})
# (1) - doesn't work
animals.groupby("kind").rolling(1).agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})
# (2) - works
animals.groupby("kind").agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})
# works, but returns multiindex
animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}})
# (3) - this is what I'd want, expected output of (1)
animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}}).droplevel(0, axis=1)
Problem description
Pandas version 0.25 provides new aggregate functionality through NamedAgg. I would expect to be able to pass this in after a groupby as a pd.apply statement, and also after a groupby and pd.rolling.
However, when I try to pass in a dict of NamedAgg tuples as shown in (1) in the example code I get the following TypeError: TypeError: aggregate() missing 1 required positional argument: 'arg'
Is this expected or am I incorrectly using this functionality? Thanks in advance!
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 0.25.1 numpy : 1.16.1 pytz : 2018.5 dateutil : 2.7.3 pip : 19.1.1 setuptools : 41.0.1 Cython : None pytest : 4.3.1 hypothesis : None sphinx : 1.8.5 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : 0.9.2 psycopg2 : 2.7.7 (dt dec pq3 ext lo64) jinja2 : 2.10.1 IPython : 7.3.0 pandas_datareader: None bs4 : 4.7.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.0.3 numexpr : None odfpy : None openpyxl : 2.6.1 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.1.0 sqlalchemy : 1.3.3 tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None
Should be doable, just not implemented yet. Interested in working on it?
Looks like expanding might need to be patched as well to accommodate this new behavior? Worth doing in a single PR? Or should I open a separate issue to track it?
I suspect rolling / expanding / ewma will have a single implementation that's shared.
On Sat, Sep 7, 2019 at 9:52 AM Brian Bader [email protected] wrote:
Code Sample
import pandas as pd animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'], 'height': [9.1, 6.0, 9.5, 34.0], 'weight': [7.9, 7.5, 9.9, 198.0]})
(1) - doesn't work
animals.groupby("kind").rolling(1).agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum), 'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})
(2) - works
animals.groupby("kind").agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum), 'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})
works, but returns multiindex
animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}})
(3) - this is what I'd want, expected output of (1)
animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}}).droplevel(0, axis=1)
Problem description
Pandas version 0.25 provides new aggregate functionality through NamedAgg https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.25.0.html#groupby-aggregation-with-relabeling. I would expect to be able to pass this in after a groupby as a pd.apply statement, and also after a groupby and pd.rolling.
However, when I try to pass in a dict of NamedAgg tuples as shown in (1) in the example code I get the following TypeError: TypeError: aggregate() missing 1 required positional argument: 'arg'
Is this expected or am I incorrectly using this functionality? Thanks in advance! Output of pd.show_versions() INSTALLED VERSIONS
commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 0.25.1 numpy : 1.16.1 pytz : 2018.5 dateutil : 2.7.3 pip : 19.1.1 setuptools : 41.0.1 Cython : None pytest : 4.3.1 hypothesis : None sphinx : 1.8.5 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : 0.9.2 psycopg2 : 2.7.7 (dt dec pq3 ext lo64) jinja2 : 2.10.1 IPython : 7.3.0 pandas_datareader: None bs4 : 4.7.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.0.3 numexpr : None odfpy : None openpyxl : 2.6.1 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.1.0 sqlalchemy : 1.3.3 tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28333?email_source=notifications&email_token=AAKAOIS5JHCFLWPPGCMUEGLQIO5ZDA5CNFSM4IUQJFJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HJ6UI3A, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOITPDP7NWIXXMAQPGZ3QIO5ZDANCNFSM4IUQJFJQ .
This isn't implemented yet in master, is a PR still welome?
A PR to implement this would still be welcome
I would also love to see this functionality available to help with time time series aggregations
@WillAyd Created a PR implementing this enhancement #60549