vortex
vortex copied to clipboard
consider changing take for varbin
from a user report:
Trying to make sense of this: take on a chunked struct array ~ 600ms converting chunked array to arrow record batch and doing take ~ 25ms
what seems likely to be happening is that varbin.to_arrow is casting the varbin to a varbinview (relatively cheap) and then doing take on the views (extremely cheap).
we should consider killing the intrinsic varbin take impl and just converting to varbinview on take
We've also ran into ChunkedArray::take being really bad in the "shuffle" case where the indices aren't sorted, potentially creating O(len(take_array)) chunks.