cudf icon indicating copy to clipboard operation
cudf copied to clipboard

For powers of 10, replace ipow with switch

Open pmattione-nvidia opened this issue 1 year ago • 3 comments

This adds a new runtime calculation of the power-of-10 needed for applying decimal scale factors with a switch statement. This provides the fastest way of applying the scale. Note that the multiply and divide operations are performed within the switch itself, so that the compiler sees the full instruction to optimize assembly code gen. See code comments for details.

This cannot be used within fixed_point (e.g. for comparison operators and rescaling) as it introduced too much register pressure to unrelated benchmarks. It will only be used for the decimal <--> floating conversion, so it has been moved there to be in a new header file where that code will reside (in an upcoming PR). This is part of a larger change to change the algorithm for decimal <--> floating conversion to a more accurate one that is forthcoming soon.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

pmattione-nvidia avatar Mar 20 '24 18:03 pmattione-nvidia

/ok to test

pmattione-nvidia avatar Mar 20 '24 18:03 pmattione-nvidia

are there benchmarks that show the performance impact? Edit: Apologies, missed the last paragraph. Disregard

vuule avatar Mar 21 '24 20:03 vuule

When the full suite of benchmarks were run, it turns out that the GroupBy benchmarks were too adversely affected by this change to go in as it was. There was too much register pressure using these inlined functions for fixed-point comparison operations (rescaling) that was blowing up the benchmark. Therefore this code will now only be used for the decimal <--> floating conversion, and has been moved to a new header file dedicated for that (upcoming) code. Since it's been removed from fixed_point.hpp, we no longer need to modify the jitify compilation flags either for 128bit integer types. Finally, since the decimal <--> floating conversion only uses these functions for unsigned integers, the code has been updated for those as well.

pmattione-nvidia avatar Apr 26 '24 18:04 pmattione-nvidia

LGTM with all the latest changes!

mhaseeb123 avatar May 20 '24 18:05 mhaseeb123

/merge

pmattione-nvidia avatar May 21 '24 18:05 pmattione-nvidia