gr-lora_sdr Performance of cyl_bessel_i() on a low-powered arm64 device

Performance of cyl_bessel_i() on a low-powered arm64 device

Open mskvortsov opened this issue 10 months ago • 4 comments

While running the receiver on a low-powered device like Raspberry Pi, I'm seeing a high CPU load. A signal gets sampled at a 5 Msps rate, SF 11, BW 250.

A quick profiling of a run-to-completion flow from a File Source w/o throttling block shows the boost::math::cyl_bessel_i() function takes a substantial time. As it turns out, a default Boost math policy promotes doubles to long doubles the device is struggling to compute with.

The promotion can be disabled as described in https://www.boost.org/doc/libs/1_85_0/libs/math/doc/html/math_toolkit/tradoffs.html:

diff --git a/lib/fft_demod_impl.cc b/lib/fft_demod_impl.cc
index 784403a..f622ada 100644
--- a/lib/fft_demod_impl.cc
+++ b/lib/fft_demod_impl.cc
@@ -14,2 +14,5 @@ extern "C" {

+using namespace boost::math::policies;
+auto no_double_promotion_policy = make_policy(promote_double<false>());
+
 namespace gr {
@@ -197,3 +200,4 @@ namespace gr {
                 if (bessel_arg < 713)  // 713 ~ log(std::numeric_limits<LLR>::max())
-                    LLs[n] = boost::math::cyl_bessel_i(0, bessel_arg);  // compute Bessel safely
+                    // TODO? std::cyl_bessel_i() exists since C++17
+                    LLs[n] = boost::math::cyl_bessel_i(0, bessel_arg, no_double_promotion_policy);  // compute Bessel safely
                 else {

The fix gives a whopping ~3x speed up on RPi4 without decoding degradation on my signal. However, I don't know whether this long double precision is strictly required and can be downgraded just like that.

May 02 '24 18:05 mskvortsov

gr-lora_sdr gr-lora_sdr copied to clipboard

Performance of cyl_bessel_i() on a low-powered arm64 device

gr-lora_sdr
gr-lora_sdr copied to clipboard