polars
polars copied to clipboard
Implement (de)serialization of Series/DataFrames using IPC
Our existing implementation of Serialize/Deserialize on ChunkedArray is not very optimized, and does not support nested data well.
We should leverage IPC to improve this.
Added p-low for now as it will be the bottlebeck once we want to support larger frames, but I'd like to start with cloud datasets.
Added p-low for now as it will be the bottlebeck once we want to support larger frames, but I'd like to start with cloud datasets.
Do you think it will be less effort to fix the various bugs with our current serialization (mostly for nested types) than to switch to IPC serialization?
Performance doesn't have to be optimal at first, but the serialization does need to be correct in all cases.
No, arbitrary nesting is much more complex and it will be an effort that's in vain as we will switch to IPC anyway. The p-goal is to get the cloud queries running. We can start with non-nested literals for now, until we switch to IPC.
is this closed by https://github.com/pola-rs/polars/pull/20266?