pandas Unexpected TypeError using `pd.NamedAgg` dict with `pd.rolling`

Code Sample


import pandas as pd
animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                        'height': [9.1, 6.0, 9.5, 34.0],
                        'weight': [7.9, 7.5, 9.9, 198.0]})

# (1) - doesn't work
animals.groupby("kind").rolling(1).agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
                                          'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})

# (2) - works
animals.groupby("kind").agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
                               'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})

# works, but returns multiindex
animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}})

# (3) - this is what I'd want, expected output of (1)
animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}}).droplevel(0, axis=1)

Problem description

Pandas version 0.25 provides new aggregate functionality through NamedAgg. I would expect to be able to pass this in after a groupby as a pd.apply statement, and also after a groupby and pd.rolling.

However, when I try to pass in a dict of NamedAgg tuples as shown in (1) in the example code I get the following TypeError: TypeError: aggregate() missing 1 required positional argument: 'arg'

Is this expected or am I incorrectly using this functionality? Thanks in advance!

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 0.25.1 numpy : 1.16.1 pytz : 2018.5 dateutil : 2.7.3 pip : 19.1.1 setuptools : 41.0.1 Cython : None pytest : 4.3.1 hypothesis : None sphinx : 1.8.5 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : 0.9.2 psycopg2 : 2.7.7 (dt dec pq3 ext lo64) jinja2 : 2.10.1 IPython : 7.3.0 pandas_datareader: None bs4 : 4.7.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.0.3 numexpr : None odfpy : None openpyxl : 2.6.1 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.1.0 sqlalchemy : 1.3.3 tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None

Sep 07 '19 14:09 ghost

Should be doable, just not implemented yet. Interested in working on it?

Sep 07 '19 19:09 TomAugspurger

Looks like expanding might need to be patched as well to accommodate this new behavior? Worth doing in a single PR? Or should I open a separate issue to track it?

Sep 10 '19 14:09 christopherzimmerman

I suspect rolling / expanding / ewma will have a single implementation that's shared.

On Sat, Sep 7, 2019 at 9:52 AM Brian Bader [email protected] wrote:

Code Sample

import pandas as pd animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'], 'height': [9.1, 6.0, 9.5, 34.0], 'weight': [7.9, 7.5, 9.9, 198.0]})

(1) - doesn't work

animals.groupby("kind").rolling(1).agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum), 'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})

(2) - works

animals.groupby("kind").agg(**{'total weight': pd.NamedAgg(column='weight', aggfunc=sum), 'min_weight': pd.NamedAgg(column='weight', aggfunc=min)})

works, but returns multiindex

animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}})

(3) - this is what I'd want, expected output of (1)

animals.groupby("kind").rolling(1).agg({'weight': {'total_weight': 'sum', 'min_weight': 'min'}}).droplevel(0, axis=1)

Problem description

Pandas version 0.25 provides new aggregate functionality through NamedAgg https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.25.0.html#groupby-aggregation-with-relabeling. I would expect to be able to pass this in after a groupby as a pd.apply statement, and also after a groupby and pd.rolling.

However, when I try to pass in a dict of NamedAgg tuples as shown in (1) in the example code I get the following TypeError: TypeError: aggregate() missing 1 required positional argument: 'arg'

Is this expected or am I incorrectly using this functionality? Thanks in advance! Output of pd.show_versions() INSTALLED VERSIONS

commit : None python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 0.25.1 numpy : 1.16.1 pytz : 2018.5 dateutil : 2.7.3 pip : 19.1.1 setuptools : 41.0.1 Cython : None pytest : 4.3.1 hypothesis : None sphinx : 1.8.5 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : 0.9.2 psycopg2 : 2.7.7 (dt dec pq3 ext lo64) jinja2 : 2.10.1 IPython : 7.3.0 pandas_datareader: None bs4 : 4.7.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.0.3 numexpr : None odfpy : None openpyxl : 2.6.1 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.1.0 sqlalchemy : 1.3.3 tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28333?email_source=notifications&email_token=AAKAOIS5JHCFLWPPGCMUEGLQIO5ZDA5CNFSM4IUQJFJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HJ6UI3A, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOITPDP7NWIXXMAQPGZ3QIO5ZDANCNFSM4IUQJFJQ .

Sep 10 '19 14:09 TomAugspurger

This isn't implemented yet in master, is a PR still welome?

Sep 12 '21 13:09 leonarduschen

A PR to implement this would still be welcome

Aug 27 '24 20:08 WillAyd

I would also love to see this functionality available to help with time time series aggregations

Sep 18 '24 09:09 admivsn

@WillAyd Created a PR implementing this enhancement #60549

Dec 12 '24 06:12 snitish