python-diskcache icon indicating copy to clipboard operation
python-diskcache copied to clipboard

support pandas.DataFrame in cache.memoize()

Open LudwigAJ opened this issue 1 year ago • 1 comments

Since being one of the few choices available in Python's Dash library, and since there is heavy usage of pandas.DataFrames in general, could the .memoize() functions introduce support for these?

As of now I believe the functions simply tries to hash the DataFrame object and not its contents. Which doesn't guarantee the same hash for the same frame (data-wise).

pandas has the following function: pandas.util.hash_pandas_object which could be used to hash the contents.

The user could then specify which input parameters/arguments of decorated functions are DataFrames via an additional frames parameter.

It could work similarly to the ignore parameter. i.e. something like the following.

@cache.memoize(frames={0, 'myDF'})
def someFunc(myDF, someDate, someString):
    # do some operation(s)
    return someResult

LudwigAJ avatar Apr 17 '24 11:04 LudwigAJ

given how commonly Pandas DFs are used, seems it would make sense to special-case this inside diskcache and not make the user manually specify the dataframe inputs

Alternatively/additionally is there a way that custom hashers/serializers could be used? (haven't seen any obvious support for this today in the docs, but maybe I missed something, or maybe it's something that could be added?)

gabrielgrant avatar Jan 17 '25 01:01 gabrielgrant