gr-lora_sdr
gr-lora_sdr copied to clipboard
Performance of cyl_bessel_i() on a low-powered arm64 device
While running the receiver on a low-powered device like Raspberry Pi, I'm seeing a high CPU load. A signal gets sampled at a 5 Msps rate, SF 11, BW 250.
A quick profiling of a run-to-completion flow from a File Source w/o throttling block shows the boost::math::cyl_bessel_i() function takes a substantial time. As it turns out, a default Boost math policy promotes doubles to long doubles the device is struggling to compute with.
The promotion can be disabled as described in https://www.boost.org/doc/libs/1_85_0/libs/math/doc/html/math_toolkit/tradoffs.html:
diff --git a/lib/fft_demod_impl.cc b/lib/fft_demod_impl.cc
index 784403a..f622ada 100644
--- a/lib/fft_demod_impl.cc
+++ b/lib/fft_demod_impl.cc
@@ -14,2 +14,5 @@ extern "C" {
+using namespace boost::math::policies;
+auto no_double_promotion_policy = make_policy(promote_double<false>());
+
namespace gr {
@@ -197,3 +200,4 @@ namespace gr {
if (bessel_arg < 713) // 713 ~ log(std::numeric_limits<LLR>::max())
- LLs[n] = boost::math::cyl_bessel_i(0, bessel_arg); // compute Bessel safely
+ // TODO? std::cyl_bessel_i() exists since C++17
+ LLs[n] = boost::math::cyl_bessel_i(0, bessel_arg, no_double_promotion_policy); // compute Bessel safely
else {
The fix gives a whopping ~3x speed up on RPi4 without decoding degradation on my signal. However, I don't know whether this long double precision is strictly required and can be downgraded just like that.