pyjanitor
pyjanitor copied to clipboard
_metadata properties do not work with pyjanitor
Brief Description
_matadata original properties are not pased to pyjanitor manipulation results
System Information
- Operating system: Windows
- OS details (optional): 11
- Python version (required): 3.13
Minimally Reproducible Code
import pandas as pd
import janitor # noqa: F401
import pandas_flavor as pf
# See: https://pandas.pydata.org/pandas-docs/stable/development/extending.html#define-original-properties
class MyDataFrame(pd.DataFrame):
# normal properties
_metadata = ["myvar"]
@property
def _constructor(self):
return MyDataFrame
@pf.register_dataframe_method
def regvar(self):
obj = MyDataFrame(self)
obj.myvar = 2
return obj
@pf.register_dataframe_method
def printvar(self):
print(self.myvar)
return self
df = pd.DataFrame(
{
"Year": [1999, 2000, 2004, 1999, 2004],
"Taxon": [
"Saccharina",
"Saccharina",
"Saccharina",
"Agarum",
"Agarum",
],
"Abundance": [4, 5, 2, 1, 8],
}
)
df2 = df.regvar().query("Taxon=='Saccharina'").printvar()
index = pd.Index(range(1999,2005),name='Year')
df2 = df.regvar().complete(index, "Taxon", sort=True).printvar()
Error Messages
First call with built-in pandas method correctly returns 2.
Second call with pyjanitor method returns:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_4412\627945022.py in ?()
39
40 df2 = df.regvar().query("Taxon=='Saccharina'").printvar()
41
42 index = pd.Index(range(1999,2005),name='Year')
---> 43 df2 = df.regvar().complete(index, "Taxon", sort=True).printvar()
c:\Users\raffaele\venvs\base\Lib\site-packages\pandas_flavor\register.py in ?(self, *args, **kwargs)
160 object: The result of calling of the method.
161 """
162 global method_call_ctx_factory
163 if method_call_ctx_factory is None:
--> 164 return method(self._obj, *args, **kwargs)
165
166 return handle_pandas_extension_call(
167 method, method_signature, self._obj, args, kwargs
~\AppData\Local\Temp\ipykernel_4412\627945022.py in ?(self)
21 @pf.register_dataframe_method
22 def printvar(self):
---> 23 print(self.myvar)
24 return self
c:\Users\raffaele\venvs\base\Lib\site-packages\pandas\core\generic.py in ?(self, name)
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'myvar'
hi @raffaem
the error you get :
AttributeError: 'DataFrame' object has no attribute 'myvar'
happens even without pyjanitor:
In [12]: df.myvar
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-6462a428274e> in ?()
----> 1 df.myvar
~/mambaforge/envs/normal/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, name)
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'myvar'
In [13]: class MyDataFrame(pd.DataFrame):
...:
...: # normal properties
...: _metadata = ["myvar"]
...:
...: @property
...: def _constructor(self):
...: return MyDataFrame
...:
In [14]: df = pd.DataFrame(
...: {
...: "Year": [1999, 2000, 2004, 1999, 2004],
...: "Taxon": [
...: "Saccharina",
...: "Saccharina",
...: "Saccharina",
...: "Agarum",
...: "Agarum",
...: ],
...: "Abundance": [4, 5, 2, 1, 8],
...: }
...: )
In [15]: df.myvar
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-6462a428274e> in ?()
----> 1 df.myvar
~/mambaforge/envs/normal/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, name)
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'myvar'
kindly break it down in chunks, test on pandas without pyjanitor, then let's test the equivalent in pyjanitor. that way we can figure out what the issue is and resolve, if possible.
Also, i'll tag @ericmjl on this as well
It doesn't, see
First call with built-in pandas method correctly returns 2.
in my first post.
Also I don't understand your second example. MyDataFrame is returned by regvar.
If you instance a raw pandas.DataFrame and call .myvar on it of course it doesn't exist. It's an attribute of MyDataFrame.
In [15]: df.myvar
I don't understand what you are doing here.
You instantiated a raw pandas.DataFrame. Of course myvar doesn't exist.
You need to call the function regvar I provided in my first post.
@raffaem i'll let someone else chime in on the issue. @ericmjl thoughts?
Here is a temporary solution: https://stackoverflow.com/questions/79631026/metadata-properties-do-not-work-with-pyjanitor
@raffaem that actually looks like a pretty good seed of a solution, I'm wondering if there's a way for us to do this inside pandas-flavor or pyjanitor?
@raffaem that actually looks like a pretty good seed of a solution, I'm wondering if there's a way for us to do this inside pandas-flavor or pyjanitor?
Honestly I still have to understand the difference between the decorators we use and monkeypatching. See here.
I think this wouldn't happen with monkeypatching