cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Cleanup how args and kwargs are passed in `_fast_slow_function_call`

Open Matt711 opened this issue 1 year ago • 8 comments

Description

I think kwargs are being subsumed by args in _fast_slow_function_call when they shouldn't.

Checklist

  • [ ] I am familiar with the Contributing Guidelines.
  • [ ] New or existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

Matt711 avatar Jul 12 '24 01:07 Matt711

@Matt711 Could you explain what the issue is your are seeing?

In _fast_slow_function_call the kwargs are being included in args. I think this makes it so that kwargs (an therefore slow_kwargs and fast_kwargs) are always empty.

Matt711 avatar Jul 12 '24 03:07 Matt711

It's still a little unclear how this bug would manifest. For example, this still raises

In [1]: %load_ext cudf.pandas

In [2]: import pandas as pd

In [3]: pd.Timestamp(2020, 1, 1, year=2021)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
File ~/mroeschke-cudf/python/cudf/cudf/pandas/fast_slow_proxy.py:896, in _fast_slow_function_call(func, *args, **kwargs)
    891 with nvtx.annotate(
    892     "EXECUTE_FAST",
    893     color=_CUDF_PANDAS_NVTX_COLORS["EXECUTE_FAST"],
    894     domain="cudf_pandas",
    895 ):
--> 896     fast_args, fast_kwargs = _fast_arg(args), _fast_arg(kwargs)
    897     result = func(*fast_args, **fast_kwargs)

File ~/mroeschke-cudf/python/cudf/cudf/pandas/fast_slow_proxy.py:1046, in _fast_arg(arg)
   1045 seen: set[int] = set()
-> 1046 return _transform_arg(arg, "_fsproxy_fast", seen)

File ~/mroeschke-cudf/python/cudf/cudf/pandas/fast_slow_proxy.py:973, in _transform_arg(arg, attribute_name, seen)
    971 if type(arg) is tuple:
    972     # Must come first to avoid infinite recursion
--> 973     return tuple(_transform_arg(a, attribute_name, seen) for a in arg)
    974 elif hasattr(arg, "__getnewargs_ex__"):
    975     # Partial implementation of to reconstruct with
    976     # transformed pieces
    977     # This handles scipy._lib._bunch._make_tuple_bunch

File ~/mroeschke-cudf/python/cudf/cudf/pandas/fast_slow_proxy.py:973, in <genexpr>(.0)
    971 if type(arg) is tuple:
    972     # Must come first to avoid infinite recursion
--> 973     return tuple(_transform_arg(a, attribute_name, seen) for a in arg)
    974 elif hasattr(arg, "__getnewargs_ex__"):
    975     # Partial implementation of to reconstruct with
    976     # transformed pieces
    977     # This handles scipy._lib._bunch._make_tuple_bunch

File ~/mroeschke-cudf/python/cudf/cudf/pandas/fast_slow_proxy.py:958, in _transform_arg(arg, attribute_name, seen)
    957 if typ is _Unusable:
--> 958     raise Exception("Cannot transform _Unusable")
    959 return typ

Exception: Cannot transform _Unusable

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 pd.Timestamp(2020, 1, 1, year=2021)

File ~/mroeschke-cudf/python/cudf/cudf/pandas/_wrappers/pandas.py:131, in Timestamp_Timedelta__new__(cls, *args, **kwargs)
    124 def Timestamp_Timedelta__new__(cls, *args, **kwargs):
    125     # Call fast/slow constructor
    126     # This takes care of running __init__ as well, but must be paired
   (...)
    129     # Timestamp & Timedelta don't always return same types as self,
    130     # hence this method is needed.
--> 131     self, _ = _fast_slow_function_call(
    132         lambda cls, args, kwargs: cls(*args, **kwargs),
    133         cls,
    134         args,
    135         kwargs,
    136     )
    137     return self

File ~/mroeschke-cudf/python/cudf/cudf/pandas/fast_slow_proxy.py:941, in _fast_slow_function_call(func, *args, **kwargs)
    939         slow_args, slow_kwargs = _slow_arg(args), _slow_arg(kwargs)
    940         with disable_module_accelerator():
--> 941             result = func(*slow_args, **slow_kwargs)
    942 return _maybe_wrap_result(result, func, *args, **kwargs), fast

File ~/mroeschke-cudf/python/cudf/cudf/pandas/_wrappers/pandas.py:132, in Timestamp_Timedelta__new__.<locals>.<lambda>(cls, args, kwargs)
    124 def Timestamp_Timedelta__new__(cls, *args, **kwargs):
    125     # Call fast/slow constructor
    126     # This takes care of running __init__ as well, but must be paired
   (...)
    129     # Timestamp & Timedelta don't always return same types as self,
    130     # hence this method is needed.
    131     self, _ = _fast_slow_function_call(
--> 132         lambda cls, args, kwargs: cls(*args, **kwargs),
    133         cls,
    134         args,
    135         kwargs,
    136     )
    137     return self

File timestamps.pyx:1725, in pandas._libs.tslibs.timestamps.Timestamp.__new__()

TypeError: __new__() got multiple values for keyword argument 'year'

mroeschke avatar Jul 17 '24 18:07 mroeschke

pd.Timestamp(2020, 1, 1, year=2021)

Yeah I would expect that example to still raise. This PR just addresses the fact that kwargs are always empty.

Matt711 avatar Jul 18 '24 13:07 Matt711

pd.Timestamp(2020, 1, 1, year=2021)

Yeah I would expect that example to still raise. This PR just addresses the fact that kwargs are always empty.

@Matt711 Could you please add a couple of tests that are currently failing and pass with this fix?

galipremsagar avatar Jul 18 '24 13:07 galipremsagar

@Matt711 Could you please add a couple of tests that are currently failing and pass with this fix?

I don't think there's a good way to test this PR because all it does is change how we call _fast_slow_function_call

  1. _fast_slow_function_call(func, args, kwargs)
  2. _fast_slow_function_call(func, *args, **kwargs)

With the second option, we unpack the args and kwargs so that kwargs are not always empty (so this is more of a general python thing).

I labeled this a bug but there's nothing really incorrect here in the sense of an incorrect result from _fast_slow_function_call(I can edit the description). I think it makes more sense to unpack the args and kwargs, so that kwargs aren't included in args. If we don't, kwargs and therefore fast_kwargs and slow_kwargs will always be empty. So _slow_arg(kwargs) is never doing anything meaningful. What do you think?

My apologies for any confusion. cc. @mroeschke

Matt711 avatar Jul 18 '24 20:07 Matt711

If there's not strict bug here, IMO I would just treat this as a "cleanup" and just remove **kwargs from _fast_slow_function_call if we never pass kwargs to it internally. Also minimizes a chance of another bug cropping up if splatting the kwargs has some unintended consequence we don't know about.

mroeschke avatar Jul 18 '24 21:07 mroeschke

I agree with @mroeschke. No need to label this a bug, but let's simplify the API to reduce the possibility of error for devs in the future.

vyasr avatar Jul 19 '24 22:07 vyasr

Looks like this needs some conflict resolution, and there are some discussions to be addressed @Matt711.

vyasr avatar Aug 16 '24 19:08 vyasr

TODO: Check if this improvement PR is still relevant.

Matt711 avatar Sep 24 '24 03:09 Matt711