polars icon indicating copy to clipboard operation
polars copied to clipboard

Implement (de)serialization of Series/DataFrames using IPC

Open stinodego opened this issue 1 year ago • 3 comments

Our existing implementation of Serialize/Deserialize on ChunkedArray is not very optimized, and does not support nested data well.

We should leverage IPC to improve this.

stinodego avatar Jun 27 '24 19:06 stinodego

Added p-low for now as it will be the bottlebeck once we want to support larger frames, but I'd like to start with cloud datasets.

ritchie46 avatar Jun 30 '24 07:06 ritchie46

Added p-low for now as it will be the bottlebeck once we want to support larger frames, but I'd like to start with cloud datasets.

Do you think it will be less effort to fix the various bugs with our current serialization (mostly for nested types) than to switch to IPC serialization?

Performance doesn't have to be optimal at first, but the serialization does need to be correct in all cases.

stinodego avatar Jun 30 '24 07:06 stinodego

No, arbitrary nesting is much more complex and it will be an effort that's in vain as we will switch to IPC anyway. The p-goal is to get the cloud queries running. We can start with non-nested literals for now, until we switch to IPC.

ritchie46 avatar Jun 30 '24 07:06 ritchie46

is this closed by https://github.com/pola-rs/polars/pull/20266?

lukemanley avatar Dec 22 '24 01:12 lukemanley