radixsort.js icon indicating copy to clipboard operation
radixsort.js copied to clipboard

Radix-sort numbers with associated data

Open mourner opened this issue 8 years ago • 5 comments

@jasondavies thanks for the awesome module, its performance is mind-blowing! Still scratching my head over the following issue though:

In most algorithms, you don't just sort a set of numbers without context — usually you sort numbers associated with some additional data. For example, when sorting 2d points by x, you can't use radixSort in the current form because you also need to access y in addition to x in resulting x-sorted points.

What's the easiest way to modify this module to support this kind of use case? For example, could we additionally generate an array associating each sorted element wth original array through indices?

mourner avatar Jan 05 '16 14:01 mourner

To illustrate what I would want to achieve:

var points = [[10, 10], [25, 7], [15, 12], [12, 16], ...];

var xValues = points.map(p => p[0]); // [10, 25, 15, 12, ...]

var indices = radixSortIndices(xValues); // [0, 3, 2, 1, ...]
xValues; // [10, 12, 15, 25, ...] (sorted)

var sortedPoints = indices.map(i => points[i]);

Is this even possible (retaining the performance)?

mourner avatar Jan 05 '16 14:01 mourner

@mourner Did you ever happen to find a solution for this? I've run into this many times recently and would love to come up with a nice solution. argsort in numpy is one way to approach this, though it requires out of place storage, which shouldn't be necessary. I don't see why it would be any more complicated than striding and bookkeeping, but haven't found a module that nails it just yet.

(ah, though my data currently looks more like [x0, y0, x1, y1, x2, y2, ...] where I'd like to sort by x, for example). Seems like there's maybe a family of sort functions that could exist.

rreusser avatar May 24 '18 16:05 rreusser

@rreusser nope, so far I've been using a custom quicksort for that which swaps elements in multiple typed arrays in parallel. It's usually fast enough. Here's a recent example https://github.com/mourner/flatbush/blob/master/index.js#L199

mourner avatar May 24 '18 17:05 mourner

Makes sense. Thanks!

that sorts bbox data alongside the hilbert values

Haha that's exactly how I got here, though mine was a z-order curve for triangular mesh face bounding boxes. I've stuck with top-down at the moment though since this all took me down a bit of a rabbit hole. (30 bits for three dimensions leaves a bit (pun??) to be desired). See: bvh for top-down result. Had fun profiling it a bit with Greg Tatum, though concluded mostly that nothing too silly is happening. It's about 100ms to index 160k faces—well, with crappy mean splitting and a fixed 10 faces per leaf as opposed to the surface area heuristic—because I just can't justify further optimization at the moment. I have the vague sense that it could be a fair amount faster, but maybe slightly less than a whole order of magnitude faster, which is adequate for now…

But I digress.

Hadn't worked through flatbush, but a very helpful read. Thanks for the reference! 😄

rreusser avatar May 24 '18 18:05 rreusser

@rreusser this looks great! I also recommend looking into Hilbert curves (fast 3D algorithm here) — they may be slower to calculate, but are much better suited for bottom-up packing because the resulting BVH has less node overlap, meaning much faster queries: https://beta.observablehq.com/@mourner/hilbert-curve-packing

mourner avatar May 25 '18 08:05 mourner