pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

_metadata properties do not work with pyjanitor

Open raffaem opened this issue 6 months ago • 7 comments

Brief Description

_matadata original properties are not pased to pyjanitor manipulation results

System Information

  • Operating system: Windows
  • OS details (optional): 11
  • Python version (required): 3.13

Minimally Reproducible Code

import pandas as pd
import janitor # noqa: F401
import pandas_flavor as pf

# See: https://pandas.pydata.org/pandas-docs/stable/development/extending.html#define-original-properties
class MyDataFrame(pd.DataFrame):

    # normal properties
    _metadata = ["myvar"]

    @property
    def _constructor(self):
        return MyDataFrame

@pf.register_dataframe_method
def regvar(self):
    obj = MyDataFrame(self)
    obj.myvar = 2
    return obj

@pf.register_dataframe_method
def printvar(self):
    print(self.myvar)
    return self

df = pd.DataFrame(
     {
         "Year": [1999, 2000, 2004, 1999, 2004],
         "Taxon": [
             "Saccharina",
             "Saccharina",
             "Saccharina",
             "Agarum",
             "Agarum",
         ],
         "Abundance": [4, 5, 2, 1, 8],
     }
 )
 
df2 = df.regvar().query("Taxon=='Saccharina'").printvar()

index = pd.Index(range(1999,2005),name='Year')
df2 = df.regvar().complete(index, "Taxon", sort=True).printvar()

Error Messages

First call with built-in pandas method correctly returns 2.

Second call with pyjanitor method returns:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_4412\627945022.py in ?()
     39 
     40 df2 = df.regvar().query("Taxon=='Saccharina'").printvar()
     41 
     42 index = pd.Index(range(1999,2005),name='Year')
---> 43 df2 = df.regvar().complete(index, "Taxon", sort=True).printvar()

c:\Users\raffaele\venvs\base\Lib\site-packages\pandas_flavor\register.py in ?(self, *args, **kwargs)
    160                     object: The result of calling of the method.
    161                 """
    162                 global method_call_ctx_factory
    163                 if method_call_ctx_factory is None:
--> 164                     return method(self._obj, *args, **kwargs)
    165 
    166                 return handle_pandas_extension_call(
    167                     method, method_signature, self._obj, args, kwargs

~\AppData\Local\Temp\ipykernel_4412\627945022.py in ?(self)
     21 @pf.register_dataframe_method
     22 def printvar(self):
---> 23     print(self.myvar)
     24     return self

c:\Users\raffaele\venvs\base\Lib\site-packages\pandas\core\generic.py in ?(self, name)
   6295             and name not in self._accessors
   6296             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'myvar'

raffaem avatar May 14 '25 02:05 raffaem

hi @raffaem

the error you get :

AttributeError: 'DataFrame' object has no attribute 'myvar'

happens even without pyjanitor:

In [12]: df.myvar
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-6462a428274e> in ?()
----> 1 df.myvar

~/mambaforge/envs/normal/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, name)
   6295             and name not in self._accessors
   6296             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'myvar'
In [13]: class MyDataFrame(pd.DataFrame):
    ...:
    ...:     # normal properties
    ...:     _metadata = ["myvar"]
    ...:
    ...:     @property
    ...:     def _constructor(self):
    ...:         return MyDataFrame
    ...:

In [14]: df = pd.DataFrame(
    ...:      {
    ...:          "Year": [1999, 2000, 2004, 1999, 2004],
    ...:          "Taxon": [
    ...:              "Saccharina",
    ...:              "Saccharina",
    ...:              "Saccharina",
    ...:              "Agarum",
    ...:              "Agarum",
    ...:          ],
    ...:          "Abundance": [4, 5, 2, 1, 8],
    ...:      }
    ...:  )

In [15]: df.myvar
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-6462a428274e> in ?()
----> 1 df.myvar

~/mambaforge/envs/normal/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, name)
   6295             and name not in self._accessors
   6296             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'myvar'

kindly break it down in chunks, test on pandas without pyjanitor, then let's test the equivalent in pyjanitor. that way we can figure out what the issue is and resolve, if possible.

Also, i'll tag @ericmjl on this as well

samukweku avatar May 14 '25 10:05 samukweku

It doesn't, see

First call with built-in pandas method correctly returns 2.

in my first post.

Also I don't understand your second example. MyDataFrame is returned by regvar.

If you instance a raw pandas.DataFrame and call .myvar on it of course it doesn't exist. It's an attribute of MyDataFrame.

raffaem avatar May 14 '25 11:05 raffaem

In [15]: df.myvar

I don't understand what you are doing here.

You instantiated a raw pandas.DataFrame. Of course myvar doesn't exist.

You need to call the function regvar I provided in my first post.

raffaem avatar May 14 '25 11:05 raffaem

@raffaem i'll let someone else chime in on the issue. @ericmjl thoughts?

samukweku avatar May 14 '25 11:05 samukweku

Here is a temporary solution: https://stackoverflow.com/questions/79631026/metadata-properties-do-not-work-with-pyjanitor

raffaem avatar May 27 '25 08:05 raffaem

@raffaem that actually looks like a pretty good seed of a solution, I'm wondering if there's a way for us to do this inside pandas-flavor or pyjanitor?

ericmjl avatar May 27 '25 11:05 ericmjl

@raffaem that actually looks like a pretty good seed of a solution, I'm wondering if there's a way for us to do this inside pandas-flavor or pyjanitor?

Honestly I still have to understand the difference between the decorators we use and monkeypatching. See here.

I think this wouldn't happen with monkeypatching

raffaem avatar May 27 '25 12:05 raffaem