salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Performance issue in use of boost::math::digamma on aarch64 Linux

Open dslarm opened this issue 1 year ago • 2 comments

Due to a current default in the boost library (https://github.com/boostorg/math/issues/1211) in boost::math::digamma, there is a performance hit on aarch64.

This happens on v1.10.3 of Salmon, with GNU compiler 13 on Linux aarch64.

A 4-thread quantization of one of the Salmon tutorials DRR0* series files spends ~15% of time in this routine (called within CollapsedEMOptimizer). On a larger example, we see 7% performance hit over a run that takes 1300 seconds on 4 cores. On x86 this time is small enough to be lost in the noise.

salmon quant -i athal_index -l A -1 DRR016125/DRR016125_1.fastq.gz -2 DRR016125/DRR016125_2.fastq.gz -p $threads --validateMappings -o quants/DRR016125_quant

There is a simple fix which is to ensure the CMake/Makefiles ensure salmon compiles with: -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS or to add that to any file that brings in boost::math via adding #define BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS at the start.

With that change, a 1300 second runtime drops to 1212 for the larger test case, and for the tutorial case is 48 seconds down to 40 on a 4-core r8g.xlarge (Graviton4).

Whilst Boost may fix the issue soon - it's likely that older versions of the library will be found installed for some time. It would be helpful to add this define to cmake settings, or the sources.

dslarm avatar Oct 16 '24 11:10 dslarm

Note that I think -DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false is a better option than disabling long double functions in general

rdoeffinger avatar Oct 16 '24 12:10 rdoeffinger

I agree – I wasn’t aware of that one. I’ve tested that and it has the same effect as the other flag, performance looks good.

dslarm avatar Oct 16 '24 12:10 dslarm