Thrifty
Thrifty copied to clipboard
Fastcard Segmentation fault on NixOS
We have ourselves a Hiesenbug :ghost:
I realize NixOS isn't "officially" supported by this project, but I'm still going to try getting it going (NixOS makes deployment to Raspberry Pi very nice).
Any ideas on where this segmentation fault might be coming from would be very appreciated. I've documented my research below.
I've compiled fastcard using CMake, using the most recent libraries available on NixOS unstable. That is fftw-3.3.8 and gnuradio-3.7.13.4 (libvolk). Ubuntu 16 has fftw-3.3.4 and libvolk-1.2.1. I've also tried using libvolk-1.2.1 with no luck.
Clue # 1 is the segfault occurs when calling a function from libvolk:
https://github.com/swkrueger/Thrifty/blob/2ad9775753a8712a61c81cc78fb0bc75a921d50b/fastcard/fastcard.c#L180
Clue # 2 is that fastcard does not segfault when the block size is set to less than 4096
$ ./fastcard -i rtlsdr -b 4095 -h 4000
# works..
$ ./fastcard -i rtlsdr -b 4096 -h 4000
# ...snip...
# Segmentation Fault
Clue # 3 is that fastcard does not segfault inside valgrind
$ valgrind ./fastcard -i rtlsdr
# works...
My hunch is that it has to do with the two newer library versions. Its even possible that this bug is not occurring in this projects code. But if something comes to mind, it would be great to see this project running on Nix :heart_decoration:
Thank you for the bug report.
Does it work when you read data from a file instead of the RTL-SDR? For example:
$ rtl_sdr -g 5 -f 433.83M -s 2.4M data.bin # capture data, hit Ctrl-C to stop
$ ./fastcard -i data.bin -b 4096 -h 4000
Did you try using gdb with a debug build? Also, I would recommend reading from a file instead of directly from the rtlsdr when using valgrind.
Does it work when you use volk_32fc_magnitude_squared_32f_u
instead of volk_32fc_magnitude_squared_32f_a
? It might be possible that the fftw alignment does not match the volk alignment for some reason.
Thanks for your reply,
Changing the volk function to volk_32fc_magnitude_squared_32f_u
seems to have fixed it. Does that mean it was an alignment issue? I can't find any documentation on the difference between the two functions.
Using a data.bin file solved the issue for the specific command I used by the way. Segfaults seemed to be effected by how many files the process had open. I wish I followed your second suggestion first :laughing:
What next? Is this a special case to be added to the CMakeLists, or an issue-close and patch for by nix package? Either suits me well.
Yeah, it is probably a memory alignment issue. volk_32fc_magnitude_squared_32f_a
assumes that the memory is being allocated with volk_malloc, which will ensure that the memory is properly aligned for SIMD instructions (Neon in the case or ARM). volk_32fc_magnitude_squared_32f_u
is for unaligned memory and would be slower and make use of the generic algorithm without Neon instructions.
The issue is probably that the Nix package for either FFTW or libvolk is compiled without Neon support. I was in a rush when I implemented fastcard and cut corners. I assumed that fftw's alignment would be the same as libvolk's alignment, which I think is the case for the Rpi configuration, library versions and architecture I used. It could be that FFTW is using a different alignment or that the FFTW library that you are using is compiled without Neon support and thus not performing any special alignment when fftw_malloc
is called. My guess is that it is FFTW. I vaguely remember something about the official Raspbian package for FFTW including a patch to enable Neon. If I remember correctly, you can check the contents of the wisdom file generated by fastcard to check whether fftw is using neon or not -- it should contain something like fftwf_codelet_t2bv_16_neon
.
You can probably fix the bug by replacing fftwf_malloc(num_bytes)
with volk_malloc(num_bytes, alignment
in fastcard/fft.c
. But then you'll have the same issue with an opposite configuration where FFTW is compiled with Neon support and libvolk not.
Assuming that my hunch is correct regarding Nix's FFTW not using Neon on the Rpi, you can basically choose any one of the following three solutions:
- Fix the Nix package for FFTW to compile it with Neon instructions on the Rpi
- Use
volk_32fc_magnitude_squared_32f_u
instead ofvolk_32fc_magnitude_squared_32f_a
and take the performance hit of using both the volk kernel and FFTW without Neon instructions. - Use
volk_malloc
instead offftwf_malloc
and take the performance hit of using FFTW without Neon instructions.
Oh, and the number 4096 actually makes sense. It is 4K, which is probably the size of a page. You can check the virtual memory page size using getconf PAGESIZE
in a shell. What could be happening is that the volk operation is going out of bounds into the next page when it starts from a misaligned address. This would result in a segfault if the next page isn't allocated. There could be cases where more memory is allocated next to that page, e.g. potentially when you read from a file and the file is mapped into the virtual address space, in which case it will not result in a segfault (but probably lead to incorrect results and unexpected behaviour).