pandas icon indicating copy to clipboard operation
pandas copied to clipboard

to_pickle does not save accessor's properties

Open achapkowski opened this issue 5 years ago • 10 comments

Code Sample, a copy-pastable example if possible

import pandas as pd

class ServiceMetadataClass(object):
    """Stores Metadata from Service endpoint"""
    url = None
    info = None
    renderer = None
    def __init__(self, url=None, renderer=None, original_fields=None):
        self.url = url
        self.renderer = renderer
        self.original_fields = original_fields
###########################################################################
@pd.api.extensions.register_dataframe_accessor("service")
class ServiceAccessor(object):
    _metadata = None
    def __init__(self, data):
        self._data = data
    @property
    def meta(self):
        if self._metadata is None:
            self._metadata = ServiceMetadataClass()
        return self._metadata
    
 

Problem description

The to_pickle/from_pickle does not pickle the state of the accessor object. Even if the __getstate__ and ___setstate__ are defined, they never get called.

Expected Output

The assert statement should be True, not False. df1's url is None, where it should be a string.

    df = pd.DataFrame(data=[[1,2,3]], columns=['a', 'b', 'c'])
    df.service.meta.url = "https://www.google.com"
    fp = r"c:/temp/atestdata.pkl"
    df.to_pickle(fp)
    df1 = pd.read_pickle(fp)
    assert df1.service.meta.url == df.service.meta.url #FAIL 

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.6.9.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252

pandas : 1.0.0 numpy : 1.16.5 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 45.1.0.post20200119 Cython : None pytest : 5.3.5 hypothesis : None sphinx : 2.2.1 blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.0 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.1 IPython : 7.8.0 pandas_datareader: None bs4 : 4.8.2 bottleneck : 1.3.2 fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : 5.3.5 pyxlsb : None s3fs : None scipy : 1.3.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : None numba : None

achapkowski avatar Mar 13 '20 11:03 achapkowski

accessors are dynamically created and are stateless. i am not sure what you hope to accomplish with the above accessor

jreback avatar Mar 13 '20 11:03 jreback

When you pickle an object it saves the state of the dataframe, so shouldn't that include any accessors?

achapkowski avatar Mar 13 '20 11:03 achapkowski

When you pickle an object it saves the state of the dataframe, so shouldn't that include any accessors?

sure if they were designed to be stateful

again what are you actually trying to do here?

jreback avatar Mar 13 '20 11:03 jreback

I need to pull information into my accessor that is metadata about where the source dataset. This can be things like author, data capture, methodologies, etc... I figured doing a simple accessor to hold this information would be best, and it works if I don't pickle it.

What would I have to implement on the accessor or class to get it to pickle?

I had the __getstate__ and __setstate__ implemented but they were never called. I did this both on my class: ServiceMetadataClass and the accessor class ServiceAccessor. Is there another method I need to implement to get this to work?

achapkowski avatar Mar 13 '20 11:03 achapkowski

maybe @TomAugspurger has a thought but we do not support stateful accessors

jreback avatar Mar 13 '20 23:03 jreback

This is outside our intended usecase for accessors.

It sounds like the .attrs attribute is what you want.

TomAugspurger avatar Mar 16 '20 15:03 TomAugspurger

attrs says it's experimental. Is this slated to be a full feature in a near release?

achapkowski avatar Mar 16 '20 16:03 achapkowski

I think the best we can do for the time being is issue a warning at pickle-time if there are any accessors that are going to be lost

jbrockmendel avatar Sep 21 '20 22:09 jbrockmendel

I think attrs would be the way to go, but I want to ask my question again. What is the timeline to move this property outside of the experimental phase?

achapkowski avatar Sep 22 '20 08:09 achapkowski

Probably once https://github.com/pandas-dev/pandas/issues/28283 is fixed.

TomAugspurger avatar Sep 22 '20 11:09 TomAugspurger