to_pickle does not save accessor's properties
Code Sample, a copy-pastable example if possible
import pandas as pd
class ServiceMetadataClass(object):
"""Stores Metadata from Service endpoint"""
url = None
info = None
renderer = None
def __init__(self, url=None, renderer=None, original_fields=None):
self.url = url
self.renderer = renderer
self.original_fields = original_fields
###########################################################################
@pd.api.extensions.register_dataframe_accessor("service")
class ServiceAccessor(object):
_metadata = None
def __init__(self, data):
self._data = data
@property
def meta(self):
if self._metadata is None:
self._metadata = ServiceMetadataClass()
return self._metadata
Problem description
The to_pickle/from_pickle does not pickle the state of the accessor object.
Even if the __getstate__ and ___setstate__ are defined, they never get called.
Expected Output
The assert statement should be True, not False. df1's url is None, where it should be a string.
df = pd.DataFrame(data=[[1,2,3]], columns=['a', 'b', 'c'])
df.service.meta.url = "https://www.google.com"
fp = r"c:/temp/atestdata.pkl"
df.to_pickle(fp)
df1 = pd.read_pickle(fp)
assert df1.service.meta.url == df.service.meta.url #FAIL
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.6.9.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 1.0.0 numpy : 1.16.5 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.1.0.post20200119 Cython : None pytest : 5.3.5 hypothesis : None sphinx : 2.2.1 blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.0 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.8.0 pandas_datareader: None bs4 : 4.8.2 bottleneck : 1.3.2 fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.3.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : None numba : None
accessors are dynamically created and are stateless. i am not sure what you hope to accomplish with the above accessor
When you pickle an object it saves the state of the dataframe, so shouldn't that include any accessors?
When you pickle an object it saves the state of the dataframe, so shouldn't that include any accessors?
sure if they were designed to be stateful
again what are you actually trying to do here?
I need to pull information into my accessor that is metadata about where the source dataset. This can be things like author, data capture, methodologies, etc... I figured doing a simple accessor to hold this information would be best, and it works if I don't pickle it.
What would I have to implement on the accessor or class to get it to pickle?
I had the __getstate__ and __setstate__ implemented but they were never called. I did this both on my class: ServiceMetadataClass and the accessor class ServiceAccessor. Is there another method I need to implement to get this to work?
maybe @TomAugspurger has a thought but we do not support stateful accessors
This is outside our intended usecase for accessors.
It sounds like the .attrs attribute is what you want.
attrs says it's experimental. Is this slated to be a full feature in a near release?
I think the best we can do for the time being is issue a warning at pickle-time if there are any accessors that are going to be lost
I think attrs would be the way to go, but I want to ask my question again. What is the timeline to move this property outside of the experimental phase?
Probably once https://github.com/pandas-dev/pandas/issues/28283 is fixed.