hpc-algorithms
                                
                                 hpc-algorithms copied to clipboard
                                
                                    hpc-algorithms copied to clipboard
                            
                            
                            
                        Order arrays of pairs of uint32?
Thanks for providing this package!
If one wants to order arrays of pairs of uint32 for the first element of the pair. Is there a faster than this?
Make an array A of pairs Uint32Array(2) and define function getKey(e) { return e[0]; } and then run RadixSortLsdUdtUInt32(A, getKey);
(Ideally, there would be just one Uint32array and one would sort the even indices together with their odd neighbors.)
You're welcome! Glad you're finding it useful. What level of performance gains are you experiencing? A custom version of LSD Radix Sort can be created which would process an array of UInt32's, where even indexes hold keys to be sorted by and odd indexes hold the data items associated with each key. This would most likely lead to the highest performance. Javascript is not my expertise, and I don't know if an array of pairs is slower to access than an single array. A faster version of LSD Radix Sort has been developed ( https://duvanenko.tech.blog/2019/02/27/lsd-radix-sort-performance-improvements/) in C++ and C#, which can be ported to Javascript to provide higher performance. -Victor
On Wed, Nov 8, 2023 at 6:05 AM 羽洲 @.***> wrote:
Thanks for providing this package!
If one wants to order arrays of pairs of uint32 for the first element of the pair. Is there a faster than this?
Make an array A of pairs Uint32Array(2) and define function getKey(e) { return e[0]; } and then run RadixSortLsdUdtUInt32(A, getKey);
(Ideally, there would be just one Uint32array and one would sort the even indices together with their odd neighbors.)
— Reply to this email directly, view it on GitHub https://github.com/DragonSpit/hpc-algorithms/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE6YHZ2IF2TTPQMDNIWIE3YDNROPAVCNFSM6AAAAAA7CV3FA2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMZUGI4DQMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Victor, Thanks for your message.
Ultimately, I'm not targeting JS for this sorting algorithm, but WebGPU. A parallel radix sort for WebGPU will be useful for many people, since it can be used for many graphics related problems like collision detection, for example.
From reading your blogs I can see you are an expert for parallel algorithms! I was chatting with Fei Yang who made this WebGPU implementation of a radix sort based on the latest, fastest algorithm (I believe?). I would like to understand this algorithm better and update Fei Yang's code a bit to make a proper interface for JS.
I actually don't know why I'm writing you all this haha, but maybe if I have a question in the future I might ask you : )
Coming back to the original question, I guess sorting for a uint32 key/value pair essentially doubles the computation time compared to sorting a pure uint32 array, because there is no way around repositioning the values the same way as the keys. This can be seen on slide 20 of this document, explaining the same algorithm as above. Extending Fei Yang's code to sort key/value pairs is another thing I need to do.
Hannes
Hi Hannes, LSD Radix Sort that you pointed out by Andy Adinets and Duane Merill is the latest GPU-based algorithm. They use some of my work (see the second page) and point to one of my blog entries: " We remark that Duvanenko recently made an independent, similar observation regarding the upfront consolidation of digit histograms for sequential implementations of LSD radix sorting [6]" It's a technique to split LSD Radix Sort into two phases: Histogram and Permutation, to reduce the number of passes over the array almost in half. I have only scanned Andy's and Duane's paper so far, and do not yet fully understand their technique.
WebGPU is an exciting effort.
I appreciate you writing to me, as there aren't many people that understand these particular algorithms.
Doing key/value pair sorting can be done by moving both items together, which drops performance in half, as now there is twice as much data to move around. But, maybe it's possible to come up with a technique that doesn't move the value. Maybe the index of the value is moved instead. This would pay off for 64-bit values and arrays smaller than 4 GigaElements. It doesn't pay off for 32-bit values, since the index is also 32-bits.
Take a look at my other repos: (C++) https://github.com/DragonSpit/ParallelAlgorithms (C#) https://github.com/DragonSpit/HPCsharp
I have also written a book, but it is only for CPU's: https://www.amazon.com/Practical-Parallel-Algorithms-Sorting-Multicore-ebook/dp/B0C3TZPRKZ/ref=sr_1_2?crid=23440PVJS5U12&keywords=duvanenko&qid=1699848491&sprefix=duvanenko%2Caps%2C94&sr=8-2 It offers deeper and more updated explanations for some of these CPU-targetted algorithms, but not the OneSweep of Andy's and Duane's.
-Victor
On Fri, Nov 10, 2023 at 11:33 PM 羽洲 @.***> wrote:
Hi Victor, Thanks for your message.
Ultimately, I'm not targeting JS for this sorting algorithm, but WebGPU. A parallel radix sort for WebGPU will be useful for many people, since it can be used for many graphics related problems like collision detection, for example.
From reading your blogs I can see you are an expert for parallel algorithms! I was chatting with Fei Yang who made this https://github.com/fynv/webgpu_math WebGPU implementation of a radix sort based on the latest, fastest algorithm https://arxiv.org/pdf/2206.01784.pdf (I believe?). I would like to understand this algorithm better and update Fei Yang's code a bit to make a proper interface for JS.
I actually don't know why I'm writing you all this haha, but maybe if I have a question in the future I might ask you : )
Coming back to the original question, I guess sorting for a uint32 key/value pair essentially doubles the computation time compared to sorting a pure uint32 array, because there is no way around repositioning the values the same way as the keys. This can be seen on slide 20 of this document https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21572-a-faster-radix-sort-implementation.pdf, explaining the same algorithm as above. Extending Fei Yang's code to sort key/value pairs is another thing I need to do.
Hannes
— Reply to this email directly, view it on GitHub https://github.com/DragonSpit/hpc-algorithms/issues/1#issuecomment-1806675177, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE6YH43GXETYCYEBAGYLUDYD35YJAVCNFSM6AAAAAA7CV3FA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWGY3TKMJXG4 . You are receiving this because you commented.Message ID: @.***>