arrow2 icon indicating copy to clipboard operation
arrow2 copied to clipboard

add computer-reorder feature to allow in-place sorting

Open bguo068 opened this issue 1 year ago • 0 comments

If I understand it correctly, multiple-column sorting in arrow2 is done in two steps:

While the method works quite well, it sometimes causes unnecessary memory allocation in step 2. For instance, when references to the columns are unique (not shared), we can do in-place sorting without allocation.

To avoid allocation, maybe we can have a reorder function (within a feature like compute-reorder) to do it. The idea has been implemented for a single slice here (source:

fn sort_by_indices<T>(data: &mut [T], mut indices: Vec<usize>) {
    for idx in 0..data.len() {
        if indices[idx] != idx {
            let mut current_idx = idx;
            loop {
                let target_idx = indices[current_idx];
                indices[current_idx] = current_idx;
                if indices[target_idx] == target_idx {
                    break;
                }
                data.swap(current_idx, target_idx);
                current_idx = target_idx;
            }
        }
    }
}

This idea can be easily expanded to multiple columns or chunks.

Hope this makes sense. If not please comment on how in-place sorting can be done with the current version of arrow2. Thanks!!

bguo068 avatar May 13 '23 14:05 bguo068