libm Add cr_log from core-math

Don't look too close, this is intentionally terrible code (line for line)

ci: allow-regressions

Oct 27 '24 01:10 tgross35

Initial pass with things looking correct shows a 15x regression for softfloat mode

icount::icount_bench_log_group::icount_bench_log logspace:setup_log()
Performance has regressed: Instructions (526320 > 34702) regressed by +1416.68% (>+5.00000)
  Baselines:                      softfloat|softfloat
  Instructions:                      526320|34702                (+1416.68%) [+15.1668x]
  L1 Hits:                           586739|41280                (+1321.36%) [+14.2136x]
  L2 Hits:                                3|2                    (+50.0000%) [+1.50000x]
  RAM Hits:                             175|17                   (+929.412%) [+10.2941x]
  Total read+write:                  586917|41299                (+1321.14%) [+14.2114x]
  Estimated Cycles:                  592879|41885                (+1315.49%) [+14.1549x]

Feb 12 '25 07:02 tgross35

Version with fma is much more tolerable

icount::icount_bench_log_group::icount_bench_log logspace:setup_log()
Performance has regressed: Instructions (74217 > 34702) regressed by +113.870% (>+5.00000)
  Baselines:                      hardfloat|hardfloat
  Instructions:                       74217|34702                (+113.870%) [+2.13870x]
  L1 Hits:                           107651|41281                (+160.776%) [+2.60776x]
  L2 Hits:                                5|2                    (+150.000%) [+2.50000x]
  RAM Hits:                             158|16                   (+887.500%) [+9.87500x]
  Total read+write:                  107814|41299                (+161.057%) [+2.61057x]
  Estimated Cycles:                  113206|41851                (+170.498%) [+2.70498x]

Feb 12 '25 08:02 tgross35

Fyi @beetrees, this PR and https://github.com/rust-lang/libm/pull/322 are probably reasonably close to merging, after a bit of cleanup. I think there is some restructuring to be done here and some of the perf might be possible to get back, but I'd rather just do that cleanup after a working baseline is in tree.

Feb 12 '25 10:02 tgross35

Hm, the LUTs here have a huge impact on binary size. Thinking maybe I should keep the current implementations around behind a feature gate. Edit: or maybe just have an option to only use the fast method without cr_log_accurate? That eliminates two tables.

Feb 12 '25 10:02 tgross35

I should try marking the accurate versions #[cold] (note to self)

Feb 12 '25 10:02 tgross35

Hm, the LUTs here have a huge impact on binary size. Thinking maybe I should keep the current implementations around behind a feature gate.

I think the ideal API surface would be the libm::log function is always correctly rounded, and then there could be a libm::log_approx function (name bikeshedable) which is faster and/or has a smaller binary footprint but has less accuracy. I think LTO would be able to remove the unused functions/LUTs if they have no users? If so then I think relying on LTO is preferable to having Cargo features as it can remove all the libm functions that an application doesn't use (I doubt most applications use every single libm function) without having to have a feature per function.

Edit: or maybe just have an option to only use the fast method without cr_log_accurate? That eliminates two tables.

That does seem more maintainable than having two completely separate implementations.

Feb 12 '25 16:02 beetrees

Recreated as https://github.com/rust-lang/compiler-builtins/pull/860

Apr 20 '25 04:04 tgross35

libm libm copied to clipboard

Add cr_log from core-math

libm
libm copied to clipboard