modin icon indicating copy to clipboard operation
modin copied to clipboard

Zero-copy (besides deserialization) to_numpy on Ray

Open vnlitvinov opened this issue 4 years ago • 1 comments
trafficstars

Current implementation deserializes all partitions as numpy arrays and then concatenates them into single array. We should be able to deserialize in pre-allocated array during ray.get() relying on pickle protocol 5.

Ref: #2815 cc: @YarShev

vnlitvinov avatar Mar 04 '21 11:03 vnlitvinov

ray.get() allows to deserialize data with zero-copy for primitive data types (if an object supports pickle protocol 5). Then, we intentionally make a copy of data so the user can mutate it so I don't think we can do anything on this matter to speed up to_numpy. I think we can close the issue. @anmyachev, @dchigarev, thoughts?

YarShev avatar Mar 14 '24 10:03 YarShev