math icon indicating copy to clipboard operation
math copied to clipboard

Faster floating point comparisons

Open mborland opened this issue 3 years ago • 11 comments

mborland avatar Aug 05 '22 19:08 mborland

@NAThompson Since you got this to work previously for 128 bit float do you see anything obvious I am missing? Error for incomplete type here.

mborland avatar Aug 07 '22 15:08 mborland

Since you got this to work previously for 128 bit float do you see anything obvious I am missing?

IIRC, this was something I looked up in the quadmath manual. I couldn't find it though . . .

NAThompson avatar Aug 07 '22 20:08 NAThompson

@NAThompson This is good for review. I pulled 128 bit support into it's own header otherwise you would have to link quadmath anytime you used next.hpp.

mborland avatar Aug 23 '22 14:08 mborland

Any updates on this?

AZero13 avatar Oct 13 '22 23:10 AZero13

Here are the before and after benchmarks @NAThompson :

Original performance (Boost 1.80.0):

 Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
 This does not affect benchmark measurements, only the metadata output.
 2022-10-15T15:24:07-07:00
 Running ./new_next_performance
 Run on (10 X 24.0916 MHz CPU s)
 CPU Caches:
   L1 Data 64 KiB
   L1 Instruction 128 KiB
   L2 Unified 4096 KiB (x10)
 Load Average: 1.86, 2.53, 5.83
 ---------------------------------------------------------------------------------
 Benchmark                                       Time             CPU   Iterations
 ---------------------------------------------------------------------------------
 float_distance<float>/2/real_time            61.4 ns         61.4 ns      9074469
 float_distance<float>/4/real_time            61.7 ns         61.7 ns     11384150
 float_distance<float>/8/real_time            61.4 ns         61.4 ns     10814604
 float_distance<float>/16/real_time           61.7 ns         61.7 ns     11348376
 float_distance<float>/32/real_time           61.4 ns         61.4 ns     11387167
 float_distance<float>/64/real_time           61.6 ns         61.6 ns     11131932
 float_distance<float>/128/real_time          61.4 ns         61.4 ns     11382029
 float_distance<float>/256/real_time          61.4 ns         61.4 ns     11307649
 float_distance<float>/512/real_time          61.4 ns         61.4 ns     11376048
 float_distance<float>/1024/real_time         61.4 ns         61.4 ns     11355748
 float_distance<float>/2048/real_time         61.8 ns         61.8 ns     11373776
 float_distance<float>/4096/real_time         61.4 ns         61.4 ns     11382368
 float_distance<float>/8192/real_time         61.4 ns         61.4 ns     11353453
 float_distance<float>/16384/real_time        61.4 ns         61.4 ns     11378298
 float_distance<float>/real_time_BigO        61.48 (1)       61.47 (1)
 float_distance<float>/real_time_RMS             0 %             0 %
 float_distance<double>/2/real_time           55.6 ns         55.6 ns     12580218
 float_distance<double>/4/real_time           55.6 ns         55.6 ns     12577835
 float_distance<double>/8/real_time           55.6 ns         55.6 ns     12564909
 float_distance<double>/16/real_time          56.2 ns         56.2 ns     12554909
 float_distance<double>/32/real_time          56.0 ns         56.0 ns     12544381
 float_distance<double>/64/real_time          55.6 ns         55.6 ns     12566488
 float_distance<double>/128/real_time         55.6 ns         55.6 ns     12499581
 float_distance<double>/256/real_time         55.6 ns         55.6 ns     12565661
 float_distance<double>/512/real_time         56.1 ns         56.1 ns     12550023
 float_distance<double>/1024/real_time        55.8 ns         55.8 ns     12568603
 float_distance<double>/2048/real_time        55.6 ns         55.6 ns     12546049
 float_distance<double>/4096/real_time        55.6 ns         55.6 ns     12528525
 float_distance<double>/8192/real_time        55.9 ns         55.9 ns     12563030
 float_distance<double>/16384/real_time       56.0 ns         56.0 ns     12447644
 float_distance<double>/real_time_BigO       55.78 (1)       55.78 (1)
 float_distance<double>/real_time_RMS            0 %             0 %

 New performance:

 Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
 This does not affect benchmark measurements, only the metadata output.
 2022-10-15T15:31:37-07:00
 Running ./new_next_performance
 Run on (10 X 24.122 MHz CPU s)
 CPU Caches:
   L1 Data 64 KiB
   L1 Instruction 128 KiB
   L2 Unified 4096 KiB (x10)
 Load Average: 2.12, 2.17, 4.26
 ---------------------------------------------------------------------------------
 Benchmark                                       Time             CPU   Iterations
 ---------------------------------------------------------------------------------
 float_distance<float>/2/real_time            15.8 ns         15.8 ns     42162717
 float_distance<float>/4/real_time            15.9 ns         15.9 ns     44213877
 float_distance<float>/8/real_time            15.8 ns         15.8 ns     43972542
 float_distance<float>/16/real_time           15.8 ns         15.8 ns     44209456
 float_distance<float>/32/real_time           15.8 ns         15.8 ns     44200244
 float_distance<float>/64/real_time           15.8 ns         15.8 ns     44239293
 float_distance<float>/128/real_time          15.8 ns         15.8 ns     44171202
 float_distance<float>/256/real_time          15.8 ns         15.8 ns     44241507
 float_distance<float>/512/real_time          15.9 ns         15.8 ns     44230034
 float_distance<float>/1024/real_time         15.8 ns         15.8 ns     44241554
 float_distance<float>/2048/real_time         15.8 ns         15.8 ns     44220802
 float_distance<float>/4096/real_time         15.8 ns         15.8 ns     44220441
 float_distance<float>/8192/real_time         15.9 ns         15.9 ns     44213994
 float_distance<float>/16384/real_time        15.8 ns         15.8 ns     44215413
 float_distance<float>/real_time_BigO        15.83 (1)       15.83 (1)
 float_distance<float>/real_time_RMS             0 %             0 %
 float_distance<double>/2/real_time           15.5 ns         15.5 ns     45098165
 float_distance<double>/4/real_time           15.6 ns         15.6 ns     45065465
 float_distance<double>/8/real_time           15.5 ns         15.5 ns     45058733
 float_distance<double>/16/real_time          15.8 ns         15.7 ns     45078404
 float_distance<double>/32/real_time          15.5 ns         15.5 ns     44832734
 float_distance<double>/64/real_time          15.5 ns         15.5 ns     45077303
 float_distance<double>/128/real_time         15.5 ns         15.5 ns     45067255
 float_distance<double>/256/real_time         15.5 ns         15.5 ns     45073844
 float_distance<double>/512/real_time         15.6 ns         15.6 ns     45109342
 float_distance<double>/1024/real_time        15.5 ns         15.5 ns     44845180
 float_distance<double>/2048/real_time        15.5 ns         15.5 ns     45051846
 float_distance<double>/4096/real_time        15.5 ns         15.5 ns     45064317
 float_distance<double>/8192/real_time        15.5 ns         15.5 ns     45115653
 float_distance<double>/16384/real_time       15.5 ns         15.5 ns     45067642
 float_distance<double>/real_time_BigO       15.54 (1)       15.54 (1)
 float_distance<double>/real_time_RMS            0 %             0 %

mborland avatar Oct 15 '22 22:10 mborland

@mborland : Does it need a fast_float_distance header? Could we not just improve the performance of the current implementation?

@jzmaddock : This is looking like it's about ready to go; might want to take a look.

NAThompson avatar Oct 15 '22 22:10 NAThompson

@mborland : Does it need a fast_float_distance header? Could we not just improve the performance of the current implementation?

The float and double cases improve upon the current implementation. I could not get the __float128 case to work without forcing the user to link -lquadmath if using GCC which would be an unwelcome breaking change.

mborland avatar Oct 15 '22 22:10 mborland

I could not get the __float128 case to work without forcing the user to link -lquadmath if using GCC which would be an unwelcome breaking change.

Wait, I thought we had to link libquadmath to use __float128 . . .

NAThompson avatar Oct 16 '22 05:10 NAThompson

I could not get the __float128 case to work without forcing the user to link -lquadmath if using GCC which would be an unwelcome breaking change.

Could we use a judicious __has_include to workaround?

Also, 99% of the value will be from float and double . . .

NAThompson avatar Mar 02 '23 02:03 NAThompson

Having the include isn't enough in this case. You have to explicitly link to -lquadmath.

mborland avatar Mar 02 '23 03:03 mborland

@jzmaddock Can you please take a look at this one? @NAThompson hit me up about merging this.

mborland avatar Apr 13 '23 12:04 mborland