gnuradio
gnuradio copied to clipboard
std::bad_alloc in qa_polar_decoder_sc_systematic
https://github.com/gnuradio/gnuradio/pull/4981/checks?check_run_id=3346035979#step:6:231
On Debian 11 on github ci.
First off, "Good riddance log4cpp" is a funny title.
Besides, I don't know why this would happen. The QA test might throw std::bad_alloc because the CI server is out of memory or something. For now, I'd actually remove the "Bug" label. I don't have a Debian 11 system handy. Ubuntu 20.04 seems to pass. Does QA fail in a bullseye Docker image?
FYI, I tried reproducing this on my own system (in the same Docker container as the CI) and I saw a massive CPU spike. Maybe I would have eventually gotten the bad::alloc
.
...what I'm trying to say is, yeah, I think it's an out-of-memory problem. But not because the CI runner is low on memory. This looks like an infinite loop, or even fork bomb kind of behaviour.
I'm getting a million of these:
114: Volk warning: no arch found, returning generic impl
114: Volk warning: no arch found, returning generic impl
114: Volk warning: no arch found, returning generic impl
114: Volk warning: no arch found, returning generic impl
114: Volk warning: no arch found, returning generic impl
New Debian 11 CI:
113/247 Test #113: qa_polar_decoder_sc_list ..................... Passed 0.41 sec
Start 114: qa_polar_decoder_sc_systematic
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
Error: Process completed with exit code 134.
This is a bit too deterministic for "the container just happened to run out of RAM.
This is a bit too deterministic for "the container just happened to run out of RAM.
I was trying to say the opposite. And this is easily reproducible if you grab the Docker container and run it locally.
@mbr0wn wasn't a reaction to you but to Johannes much much earlier :)
I've been reproducing this a lot and then it struck me, been stupid:
(gdb) bt
#0 volk_get_index (impl_name=<optimized out>, n_impls=<optimized out>, impl_names=<optimized out>) at ./lib/volk_rank_archs.c:44
#1 volk_rank_archs (kern_name=kern_name@entry=0x7fd65caaf788 "volk_8u_x2_encodeframepolar_8u", impl_names=impl_names@entry=0x7fd65cb8da90 <volk_machine_avx2_64_mmx_orc+67632>, impl_deps=impl_deps@entry=0x7fd65cb8db50 <volk_machine_avx2_64_mmx_orc+67824>,
alignment=alignment@entry=0x7fd65cb8dbb0 <volk_machine_avx2_64_mmx_orc+67920>, n_impls=n_impls@entry=4, align=align@entry=true) at ./lib/volk_rank_archs.c:68
#2 0x00007fd65c986088 in __init_volk_8u_x2_encodeframepolar_8u () at ./obj-x86_64-linux-gnu/lib/volk.c:10997
#3 __volk_8u_x2_encodeframepolar_8u (frame=0x10d7a80 "", temp=0x12c1da0 "", frame_size=16) at ./obj-x86_64-linux-gnu/lib/volk.c:11022
#4 0x00007fd65b998230 in gr::fec::code::polar_decoder_sc_systematic::generic_work(void*, void*) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#5 0x00007fd65b95a37f in gr::fec::decoder_impl::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#6 0x00007fd65cf2de7a in gr::block_executor::run_one_iteration() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#7 0x00007fd65cfa159a in gr::tpb_thread_body::tpb_thread_body(std::shared_ptr<gr::block>, std::shared_ptr<boost::barrier>, int) () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#8 0x00007fd65cf92b44 in gr::thread::thread_body_wrapper<gr::tpb_container>::operator()() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#9 0x00007fd65d1bf787 in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:179
#10 0x00007fd65f0c3ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007fd65ee56def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) n
Thread 3 "fec_decoder4" hit Breakpoint 2, volk_get_index (impl_name=<optimized out>, n_impls=<optimized out>, impl_names=<optimized out>) at ./lib/volk_rank_archs.c:36
36 in ./lib/volk_rank_archs.c
(gdb) bt
#0 volk_get_index (impl_name=<optimized out>, n_impls=<optimized out>, impl_names=<optimized out>) at ./lib/volk_rank_archs.c:36
#1 volk_rank_archs (kern_name=kern_name@entry=0x7fd65caaf788 "volk_8u_x2_encodeframepolar_8u", impl_names=impl_names@entry=0x7fd65cb8da90 <volk_machine_avx2_64_mmx_orc+67632>, impl_deps=impl_deps@entry=0x7fd65cb8db50 <volk_machine_avx2_64_mmx_orc+67824>,
alignment=alignment@entry=0x7fd65cb8dbb0 <volk_machine_avx2_64_mmx_orc+67920>, n_impls=n_impls@entry=4, align=align@entry=true) at ./lib/volk_rank_archs.c:68
#2 0x00007fd65c986088 in __init_volk_8u_x2_encodeframepolar_8u () at ./obj-x86_64-linux-gnu/lib/volk.c:10997
#3 __volk_8u_x2_encodeframepolar_8u (frame=0x10d7a80 "", temp=0x12c1da0 "", frame_size=16) at ./obj-x86_64-linux-gnu/lib/volk.c:11022
#4 0x00007fd65b998230 in gr::fec::code::polar_decoder_sc_systematic::generic_work(void*, void*) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#5 0x00007fd65b95a37f in gr::fec::decoder_impl::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#6 0x00007fd65cf2de7a in gr::block_executor::run_one_iteration() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#7 0x00007fd65cfa159a in gr::tpb_thread_body::tpb_thread_body(std::shared_ptr<gr::block>, std::shared_ptr<boost::barrier>, int) () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#8 0x00007fd65cf92b44 in gr::thread::thread_body_wrapper<gr::tpb_container>::operator()() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#9 0x00007fd65d1bf787 in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:179
#10 0x00007fd65f0c3ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007fd65ee56def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)
This is a tail recursion in volk_get_index
. Sadly, it will never terminate (until it causes memory exhaustion in ctest, I guess).
volk_get_index
is a bag of mixed emotions for me:
int volk_get_index(const char* impl_names[], // list of implementations by name
const size_t n_impls, // number of implementations available
const char* impl_name // the implementation name to find
)
{
unsigned int i;
for (i = 0; i < n_impls; i++) {
if (!strncmp(impl_names[i], impl_name, 20)) {
return i;
}
}
// TODO return -1;
// something terrible should happen here
fprintf(stderr, "Volk warning: no arch found, returning generic impl\n");
return volk_get_index(impl_names, n_impls, "generic"); // but we'll fake it for now
}
-
strncmp(a,b,20)
: We'll happily compare equal impls that are only different after the first 20 characters. - It never checks whether the impl that can't be found is the
_generic
one. If that is the case, it just recurses to looking up the generic one, and we end up where we are.
What is confusing is why neither volk_get_info
nor the volk_profile
tool know of volk_8u_x2_encodeframepolar_8u_generic
. In fact, the latter doesn't seem to want to know any implementations of that at all.
@jdemel I feel like I'm lacking the overview here. What's the reason for volk_profile -R encodepolar
not yielding anything? Might that be related to why there's no such impl?
The Debian 11 container ships a binary version of VOLK. We still need to fix this (probalby in VOLK), but it's not something that we can easily inject to make CI pass. We'll need to disable this test on Debian 11 instead.
This happens in Ubuntu 22.04 as well
This is a difficult to debug issue. e.g. I start a docker container with
docker run -it --rm ubuntu:22.04 bash
and then inside the container
apt update
apt install libvolk2-dev
volk_profile -R encodepolar
everything runs fine. I'm unable to reproduce the issue. Thus, it is difficult to investigate the issue as well. If we find out how to reproduce the issue under more circumstances, that would be great.
I think this debian patch is the culprit:
https://sources.debian.org/patches/volk/2.4.1-2/make-acc-happy/
It removes #ifdef LV_HAVE_GENERIC
from volk_8u_x2_encodeframepolar_8u.h, which prevents the generic implementation from being found. With the generic implementation gone, volk_get_index
is then very unhappy.
volk_profile -R encodepolar
everything runs fine.
I don't think it's possible to reproduce with volk_profile
.
The failure can be seen by running the qa_polar_encoder_systematic
or qa_polar_decoder_sc_systematic
tests, which exercise the two GNU Radio blocks which depend on the broken volk_8u_x2_encodeframepolar_8u
kernel.
Hi guys. Debian 11 Bullseye. Same error. Pic b4 crash. Also these processes were left behind. 100% CPU FYI.
59164 user 20 0 426840 71640 37100 S 100.0 0.2 4:49.39 python3
46103 user 20 0 426840 70832 36300 S 99.7 0.2 13:41.55 python3
JT
I reported the issue to the Debian maintainer and it was fixed in new releases (unstable, testing & Debian 12), but it was not fixed in Debian 11. So you'll either have to live with the bug, build VOLK from source, or upgrade to a newer Debian.
The Debian maintainer fixed it in volk 2.5.2-2.
Ubuntu Jammy (22.04) ships 2.5.1-1 which is broken.
Ubuntu Lunar (23.04) ships 2.5.2-3, which is probably fixed (I haven't confirmed).
fixed upstream