arkouda
arkouda copied to clipboard
Add some subtimers to RadixSortLSD
When doing performance optimizations on sort, it's invaluable to have subtimings to see where time is going. Historically I've maintained branches that keep the timers in them, but I found myself wondering if it's worth adding the timing code more permanently. This prototypes that in a pretty efficient manner that is free when timers are disabled (currently by config param) and relatively cheap when timers are on, though it does add some barriers.)
Previous versions of this code had explicit timers per section, which felt ugly, so this adds an enum-based wrapper. Basically you declare an enum that has a value for each section you want to time and then mark when you want to start/stop timing. This results in efficient timing (just have to index into an array and start a timer) and the print order is defined by the enum order. Other things I tried were a (custom) orderedMap of strings->timers, but this is more error prone (no compile time checking of string values) and expensive (because it either requires string hashing or string comparison in some way.)
That said, I think the orderedMap may have value for cross module timings and non-production code and may be useful to commit for use in future explorations. For an example, see a use case in a previous work: https://github.com/Bears-R-Us/arkouda/pull/1579/files#diff-9831b61e058453cfb3a06ac7460e988d95b1344b6cafa626560133bfc71826ff
This isn't ready to commit, but I am curious for input from others. I find these timers invaluable for performance investigations, but even this cleaned up version does add clutter to the code so I'm not sure if it's worth having it in the production code or just continuing to keep it off in a branch.
For reference https://github.com/Bears-R-Us/arkouda/compare/sort-subtimers is the branch I carry around. This PR has a few benefits over it -- you don't need to declare a timer or manually provide the printing order. It also has the ability to do timing within an SPMD section by doing a barrier and then having only 1 task start the timer. This enables finer grain timings than previous branches.
Coming to this late, but I would be in favor of adding this. I have found a lot of value in stealing the timers from your branch and I think having it in Arkouda would make it much easier for everyone to locate and use as an example.
I'd still like to add this, but closing until I have time to clean up a little more