cudf icon indicating copy to clipboard operation
cudf copied to clipboard

read_pickle support

Open weidinger-c opened this issue 1 year ago • 5 comments

cuDF currently has no IO support for pickle. I would need this function, but currently it exits with the error:

AttributeError: module 'cudf' has no attribute 'read_pickle'

weidinger-c avatar Jun 17 '24 06:06 weidinger-c

Just to clarify, are you looking for this feature in cudf itself, or did you need this feature using something like cudf.pandas? (also, since you're planning on reading pickles, do you also want support for to_pickle?)

If you're just looking to pickle cudf objects, you can do this manually using the pickle module, e.g.

import cudf
import pickle
a = cudf.DataFrame({"a":[1,2,3]})
# Write to pickle
pickle.dump(a, open("cdf.pkl", "wb"))
# Read from pickle
pickled_a = pickle.load(open("cdf.pkl", "rb"))

# Confirming they are equal
cudf.testing.testing.assert_frame_equal(a, pickled_a)

lithomas1 avatar Jun 17 '24 17:06 lithomas1

Thanks for the reply, I know that there is a dedicated pickle module. I just wanted to compare my code without any code changes as I thought that cuDF has feature parity with pandas df.

weidinger-c avatar Jun 18 '24 05:06 weidinger-c

Thanks for the reply, I know that there is a dedicated pickle module. I just wanted to compare my code without any code changes as I thought that cuDF has feature parity with pandas df.

Thanks for clarifying.

You might want to try cudf.pandas if you'd like to use cudf with zero code change from pandas. (Although there is also an issue with read_pickle there https://github.com/rapidsai/cudf/issues/15459)

lithomas1 avatar Jun 18 '24 15:06 lithomas1

I just wanted to compare my code without any code changes as I thought that cuDF has feature parity with pandas df.

Mostly, but not completely. Other than the API compatibility is there some aspect of pandas.read_pickle that is not supported by plain pickle.load?

wence- avatar Jun 19 '24 13:06 wence-

I just wanted to compare my code without any code changes as I thought that cuDF has feature parity with pandas df.

Mostly, but not completely. Other than the API compatibility is there some aspect of pandas.read_pickle that is not supported by plain pickle.load?

No, at least nothing I am aware. As I said, I just wanted to try out and test my lib with cudf with the least possible effort to see if it brings some performance gains.

weidinger-c avatar Jun 19 '24 13:06 weidinger-c