openwebrx icon indicating copy to clipboard operation
openwebrx copied to clipboard

Feedback: Audio underrun on Start up to 25 Seconds Debian Buster

Open linuxonlinehelp opened this issue 4 years ago • 23 comments

today i rolled back my new pi3 b+ 2019 (same on pi4 4GB 2019) on Debian Stretch. Cause I got permanent audio buffer lookup to 25 seconds on start openwebrx.py openwebrx OK on: libusb-1.0-0 gcc 6.3.0.18 cmake 3.7.2 kernel 4.14.98-v7+ python 2.7.13 used rtl-sdr Dongle silver Version3

On Buster always Audio Buffer underruns python 2.7.16 are here Major Changes? or a libusb problem? libusb 1.0.3 kernel 4.19 ... who knows why? csdr seems hanging on calc high and low cut 4000! waits here on 20~25seconds then run clean on pi3 + pi4 (4GB)

System load was always ~25% no Kernel Messages or Logs

linuxonlinehelp avatar Oct 10 '19 15:10 linuxonlinehelp

Hanging Area: openwebrx-httpd:ws,0] command: SET mod=nfm low_cut=-4000 high_cut=4000 offset_freq=0 csdr old_fractional_decimator_ff: window = HAMMING csdr old_fractional_decimator_ff: taps_length = 133 csdr bandpass_fir_fft_cc: (fft_size = 512) = (taps_length = 139) + (input_size = 374) - 1 (overlap_length = 138) = taps_length - 1 csdr shift_addition_cc: reinitialized to -0 csdr bandpass_fir_fft_cc: filter initialized, low_cut = -0.361631, high_cut = 0.361631 client 0x10c32c8: CS_THREAD_FINISHED, client_goto_source = 2, errno = 32[openwebrx-httpd:ws,0] command: SET low_cut=-4000 high_cut=4000 offset_freq=151719

linuxonlinehelp avatar Oct 10 '19 15:10 linuxonlinehelp

After test Raspian Buster with Stretch Firmware on Kernel 4.14-98 no changes! Buster Default Kernel 4.19-75+4.19-79 fails with no logs !! Found infos about Memory Leaks, may be possible a factor of bug https://github.com/roger-/pyrtlsdr Installed Packes List Buster + Stretch to verify https://drive.google.com/open?id=1gXQCKo8ImFb7i89sbFfzamYV5VKIyLJR

linuxonlinehelp avatar Oct 12 '19 13:10 linuxonlinehelp

Same problem here with RPI3 and Raspbian Buster.

manofftoday avatar Oct 13 '19 10:10 manofftoday

I think a problem is in csdr bandpass_fir_fft_cc function thtat perform benchmark instead of estimate. the benchmark is really slow and on start seems to crash something (I suppose nmux) and this trigger a restart of the whole chain. I fixed with this patch and it works for me: see http://sdr.undo.it

--- ../csdr.old/csdr.c 2017-09-25 21:45:33.018152254 +0000 +++ csdr.c 2019-11-21 17:09:11.068684954 +0000 @@ -1849,14 +1849,14 @@ //make FFT plans for continously processing the input complexf* input = fft_malloc(fft_sizesizeof(complexf)); complexf input_fourier = fft_malloc(fft_size*sizeof(complexf));

  •    FFT_PLAN_T* plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 1); //forward, do benchmark
    
  •    FFT_PLAN_T* plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 0); //forward, do benchmark
    
       complexf* output_fourier = fft_malloc(fft_size*sizeof(complexf));
       complexf* output_1 = fft_malloc(fft_size*sizeof(complexf));
       complexf* output_2 = fft_malloc(fft_size*sizeof(complexf));
       //we create 2x output buffers so that one will preserve the previous overlap:
    
  •    FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 1); //inverse, do benchmark
    
  •    FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 1);
    
  •    FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 0); //inverse, do benchmark
    
  •    FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 0);
       //we initialize this buffer to 0 as it will be taken as the overlap source for the first time:
       for(int i=0;i<fft_size;i++) iof(plan_inverse_2->output,i)=qof(plan_inverse_2->output,i)=0;
    

nackstein avatar Nov 21 '19 17:11 nackstein

Hi thanks will try that, i had same issues on: Odroid N2 4GB Hexa Core Ubuntu 18.04/Buster-Armbian Pi4 4GB Ram Buster-Raspian

linuxonlinehelp avatar Nov 27 '19 00:11 linuxonlinehelp

Had same issues on Orange Pi H3 512MB

  • installed Armbian Stretch, Linux orangepipc 5.3.9-sunxi #19.11.3 SMP Mon Nov 18 18:49:43 CET 2019 armv7l GNU/Linux i did:
  • fix csdr.c, recompile with make && make install openwebrx.py settings:
  • fft_fps=3
  • fft_voverlap_factor=0.1
  • mathbox_waterfall_history_length = 5
  • samp_rate = 1200000 # with"1200000" !! then the audio underrun disappeared !! NOT 120000!!
  • center_freq = 438950000 # 70cm Band NOW CPU at 22% with LAN works like charme..

linuxonlinehelp avatar Nov 30 '19 18:11 linuxonlinehelp

I cannot get the patch to compile, in the line complexf* input = fft_malloc(fft_sizesizeof(complexf)); seemes something strange with fft_size and sizeof... what do i miss?

oe2lsp avatar Dec 01 '19 11:12 oe2lsp

I cannot get the patch to compile, in the line complexf* input = fft_malloc(fft_sizesizeof(complexf)); seemes something strange with fft_size and sizeof... what do i miss?

unfortunately I wasn't able to past correctly the patch, I don't know ML very well and the patch got all splitted up in the output above. try this patch. anyway it's very simple just change the last 1 to 0 in the makt_fft_c2c call. this change the behavior of the function that perform an estimate instead of a benchmark

--- ../csdr.old/csdr.c 2017-09-25 21:45:33.018152254 +0000 +++ csdr.c 2019-11-21 17:09:11.068684954 +0000 @@ -1849,14 +1849,14 @@ //make FFT plans for continously processing the input complexf* input = fft_malloc(fft_sizesizeof(complexf)); complexf input_fourier = fft_malloc(fft_sizesizeof(complexf)); - FFT_PLAN_T plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 1); //forward, do benchmark + FFT_PLAN_T* plan_forward = make_fft_c2c(fft_size, input, input_fourier, 1, 0); //forward, do benchmark

     complexf* output_fourier = fft_malloc(fft_size*sizeof(complexf));
     complexf* output_1 = fft_malloc(fft_size*sizeof(complexf));
     complexf* output_2 = fft_malloc(fft_size*sizeof(complexf));
     //we create 2x output buffers so that one will preserve the previous overlap:

- FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 1); //inverse, do benchmark - FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 1); + FFT_PLAN_T* plan_inverse_1 = make_fft_c2c(fft_size, output_fourier, output_1, 0, 0); //inverse, do benchmark + FFT_PLAN_T* plan_inverse_2 = make_fft_c2c(fft_size, output_fourier, output_2, 0, 0); //we initialize this buffer to 0 as it will be taken as the overlap source for the first time: for(int i=0;i<fft_size;i++) iof(plan_inverse_2->output,i)=qof(plan_inverse_2->output,i)=0;

nackstein avatar Dec 03 '19 12:12 nackstein

I just tested downgrading libfftw3* from 3.3.8 (which is the version included in buster) to 3.3.5 (which is the version in stretch), which seems to restore the original, quick startup. Anybody know what's going on between these two versions?

I do understand that switching to FFTW_ESTIMATE also does the trick, but is there any insight why? The code has been doeing FFTW_MEASURE for ages, has the behaviour been changed?

There's no other versions on the raspbian repository to try. Here's where I got the packages for the downgrade: http://raspbian.raspberrypi.org/raspbian/pool/main/f/fftw3/

jketterl avatar Dec 04 '19 16:12 jketterl

I found this part of the documentation: http://www.fftw.org/fftw3_doc/Cycle-Counters.html

and i found this in the changelog for version 3.3.6p2-1 of the debian package:

  * ARM targets have --with-slow-timer enabled to avoid difficulties with
    erratic timers for planning and self-optimisation

if i puzzle this together correcly, the .deb for 3.3.5 came without any cycle counter, and as such fell back to FFTW_ESTIMATE. from version 3.3.6p2-1 forward, a "slow" cycle timer, that is implemented in software, has been enabled, which probably allows the FFTW_MEASURE to work for the first time, albeit "slow".

Summing up: that means the patch suggested by @nackstein should restore the known behaviour for arm processors. For a useful patch, the fix should probably be wrapped in precompiler statements as to only be applied on arm processors.

I am currently recompiling fftw3 3.3.8 without the flag to verify.

I have also seen that there is hardware cycle support for armv7a processors. not sure if raspi processors fall into that, but it might be worth a try.

jketterl avatar Dec 04 '19 21:12 jketterl

confirmed: removing --with-slow-timer from the build restores quick startup, too. I will attempt to get a proper fix for this.

jketterl avatar Dec 05 '19 13:12 jketterl

i have opened up a pull request that should resolve this for raspberry users: https://github.com/simonyiszk/csdr/pull/51

it may be a little over the top since it applies to all arm processors, i am definitely open for ideas on how to detect the actual scenario. please leave replies about that on the PR.

jketterl avatar Dec 05 '19 15:12 jketterl

@jketterl jketterl i did test the changes on odroid n2 but was NOT able to compile with make cause some of the NEON Parameters dont work, so i comment / removed all Neon Parameters at the Makefile to enforce gcc "autodetect" ARM Parameters which does let make start, but run into error. For me the @nackstein nackstein Workaround works only with "empty" Neon Paramteters and Disabled Performance Check cause odroid N2 uses 2 different CPUs A73 A53 !! together as HEX Core System. If i get free time i will try a setup on RASPI 3 with BUSTER where the Start lookups up to 30 seconds.

linuxonlinehelp avatar Dec 06 '19 15:12 linuxonlinehelp

yes, i believe the cpu detection / compiler optimisation in the makefile is broken in more than one way (i.e. it detects raspberry pi by looking for "BCM2708" in /proc/cpuinfo - i have tested a bunch of my raspberries for this, and it only applies to a single 1B+). I am however not knowlegdeable enough in the field of CPU hardware to fix that.

If you did have a way to compile this before my changes, it should still work that way now. Just make sure that you keep the newly added -DCSDR_DISABLE_FFTW_MEASURE somewhere in there.

jketterl avatar Dec 06 '19 15:12 jketterl

i wrote to the Debian Maintainer of fftw3 to check this behavior cause newbies cant fix this and it makes openwebrx unuseable on ALL ARM OS 2019 setups , BUT WE NEED THIS ! cause its the one and only opensource websdr-Server Software of Andras

linuxonlinehelp avatar Dec 06 '19 15:12 linuxonlinehelp

well, i didn't inquire, but i'm assuming there is a story behind why they enabled it in the first place. Unfortunately, recompiling the packages is quite the process.

jketterl avatar Dec 06 '19 15:12 jketterl

Hello. I am the one responsible for the 3.3.6 upload to Debian, which was mostly motivated to help fftw on ARM, really. Thank you for all your trouble to identify the culprit. Please allow me some extra time to find some external input on this issue. If there is no technical agreement/solution then we should possibly have two packages, both compiled with different parameters.

smoe avatar Dec 06 '19 17:12 smoe

I don't have the external input, yet, but memory kind of kicks back in. Our work on the fftw update was motivated to get best-possible fftw performance in a high-performance setup with RPis (for Einstein@Home this was). No idea if this holds for the RPi4, but with previous models the RPi had problems to give exact timings, so you could not tell what route in fftw was the best. Hence the "slow timer" setting. Once a so called "wisdom" file was created, which takes a few hours to create, the planning is known for subsequent program invocations. Even in case that the RPi4 no longer needs that slow timer, there is yet only one package for all RPi versions and to have the slow timer parameter set for that one package kind of seems right to me.

The FFTW_ESTIMATE basically says that you don't care too much about using the best-possible way to compute with FFTW. So, the planner within the FFTW has less to think about. If that is technically sufficient for your application, then I think I would just go and set that environment variable at startup.

Did anybody of yours look into wisdom files? This should dramatically improve the performance of the FFTW. No idea if this also reduces noise levels for you - would actually be interesting to learn about. You need one wisdom file per platform. This should then grant immediate startup times, too. However, from what I understood, it is a timeout/crash somewhere else that is the main cause for the delay. Maybe you want to have a look into that, too.

A SDR is on my Xmas shopping list. Anyway, let's wait for what the ARM+FFTW experts say.

smoe avatar Dec 06 '19 18:12 smoe

We had a similar discussion some while ago: https://salsa.debian.org/science-team/fftw3/commit/20ebb730db9abeaf74145d6beb8035800fb2c05f

There is no cycle counter available in user space on arm/arm64, therefore we have to rely on with-slow-timer to get proper plan generation on arm. Without the with-slow-timer flag, fftw has no way of benchmarking and therefore always felt back to FFTW_ESTIMATE, no matter what you specified.

From the FFTW-Manual http://www.fftw.org/fftw3_doc/Cycle-Counters.html: "If you are not supported, FFTW will by default fall back on its estimator (effectively using FFTW_ESTIMATE for all plans). "

The simplest fix for that would be to use FFTW_ESTIMATE on all arm devices. It doesn't matter if it's a PI4, PI3, Odroid N2 or what ever. It's the same on all arm devices. That should restore the behavior of your app before the fftw 3.3.6p2-1 update.

A better solution would be to once generate a wisdom, export it and load it on next startup.

N30dG avatar Dec 07 '19 09:12 N30dG

@N30dG, thank you for helping out and for the link to the fftw3 repository on salsa (which is where the packaging work is orchestrated) and the associated discussion.

@all, the generation of the wisdom file needs insights on how exactly the fftw3 library is invoked. What would be pretty cool is to have the exact command line for fftw-wisdom (or its variants) given in the documentation of openwebrx and also openwebrx shoudl possibly collect working wisdom files for different platforms from the community.

As a bit of a sidenote I just saw that there is a Debian package for rtl-sdl but none for libcsdr. To compensate for your extra hassle with the ARM platform I could help with a Debian package for that library. Tell me.

smoe avatar Dec 08 '19 21:12 smoe

thank you @smoe and @N30dG for providing some insight on what exactly is going on, and why the slow timer was enabled. I really don't mind the slow timer per se, given that the old behaviour is still available by using FFTW_ESTIMATE. The only thing I'm having trouble with is detecting the slow timer, since I'd really like to keep FFTW_MEASURE in place for those scenarios where it works.

As for wisdom: I have had a quick look at them, but I have not yet fully understood the code well enough to patch it in. I'm concerned with the dynamic nature of this application, I'm not sure if the actual fft parameters will be repeating. If they are not, wisdom files would probably not help much.

Also, I wanted to clarify that openwebrx is only indirectly affected since it calls the csdr command-line tools; this issue would probably be better placed in the csdr project.

I have also done some work on a debian package over on my fork: https://github.com/jketterl/csdr/tree/debian - I'm currently trying to package all of the openwebrx related tools to facility simpler installation, even though I do not intend to publish them into the debian repositories, not for now. Many parts need polishing.

jketterl avatar Dec 08 '19 22:12 jketterl

Nice to hear you already started with the packaging. I happily sponsor an upload for you.

For all ARM platforms you can just use FFTW_ESTIMATE since prior to 3.3.6 this is all you had, anyway. And you seemed (seem for those falling back to <=3.3.5) happy with it. As I said, I would really like to learn if noise levels are different - if not then it sounds much like a non-issue except for some CPU time wasted. To learn about the platform you are running on it seems completely fine to just invoke "arch" or "uname -m", which should be on all Linux platforms.

You are right wrt to an unclear size argument of the FFTW invocation. This likely depends on the bandwith of the individual SDR, right? Just guessing. @N30dG kindly pointed me in a PM to https://github.com/simonyiszk/csdr/blob/master/fft_fftw.c where you may want to patch in the dump of all new sizes requested to stderr. Then the users know for what sizes to prepare the wisdom and with a bit of luck there is not too much variation between the devices.

smoe avatar Dec 08 '19 23:12 smoe

There is no need to know the size of a transformation for wisdom generation. You only have to know the size when you use the fftw-wisdom tool. For most applications you shouldn't use this tool anyway. Simply export the wisdom that fftw generates, when generating a plan for the transformation.

For example, the first function of https://github.com/simonyiszk/csdr/blob/master/fft_fftw.c should look something like this: ... fftw_import_wisdom_from_filename("/etc/fftw/wisdom_c2c.dat"); plan->plan = fftwf_plan_dft_1d( ... ); fftw_export_wisdom_to_filename("/etc/fftw/wisdom_c2c.dat"); ...

FFTW wisdom's can contain multiple transformation sizes in one file. If the actual size isn't contained in the wisdom you have imported, fftw adds the informations for the new size to the wisdom. This way your wisdom gets "better" over time.

N30dG avatar Dec 09 '19 16:12 N30dG