arrow2
arrow2 copied to clipboard
add computer-reorder feature to allow in-place sorting
If I understand it correctly, multiple-column sorting in arrow2 is done in two steps:
- Step1: sort into indices via sort_to_indices or lexsort_to_indices, and then
- Step2: use the indices to copy sorted values into new arrays via take
While the method works quite well, it sometimes causes unnecessary memory allocation in step 2. For instance, when references to the columns are unique (not shared), we can do in-place sorting without allocation.
To avoid allocation, maybe we can have a reorder
function (within a feature like compute-reorder
) to do it.
The idea has been implemented for a single slice here (source:
fn sort_by_indices<T>(data: &mut [T], mut indices: Vec<usize>) {
for idx in 0..data.len() {
if indices[idx] != idx {
let mut current_idx = idx;
loop {
let target_idx = indices[current_idx];
indices[current_idx] = current_idx;
if indices[target_idx] == target_idx {
break;
}
data.swap(current_idx, target_idx);
current_idx = target_idx;
}
}
}
}
This idea can be easily expanded to multiple columns or chunks.
Hope this makes sense. If not please comment on how in-place sorting can be done with the current version of arrow2
. Thanks!!