volk icon indicating copy to clipboard operation
volk copied to clipboard

index: Fix volk_get_index

Open jdemel opened this issue 3 years ago • 11 comments

This function results in an infinite loop on Debian 11 for some impls. This is a first step to fix it.

@mbr0wn , @marcusmueller does that fix the reported issue? If you could point me to a CI script for Debian 11 that we can integrate here, that'd be very helpful.

Fix #516

EDIT This is a minimal Debian 11 Container to test things.

FROM debian:bullseye

RUN apt-get update \
        && apt-get install -y \
        build-essential git cmake python3-mako python3-distutils liborc-dev

GNU Radio CI container for Debian 11 where volk_profile hangs on volk_32f_8u_polarbutterflypuppet_32f. The corresponding Dockerfile: Dockerfile.

jdemel avatar Sep 26 '21 14:09 jdemel

Don't we need to handle -1 in the calling functions to avoid calling into random memory?

marcusmueller avatar Sep 26 '21 15:09 marcusmueller

Don't we need to handle -1 in the calling functions to avoid calling into random memory?

+1 from me on this: looks like that's just in tmpl/volk.tmpl.c : 191

michaelld avatar Sep 26 '21 15:09 michaelld

I fixed the "file local comments". Thus, we need to discuss how to deal with the -1 return value. Why did we avoid to deal with it until now? We ended up in an infinite loop and thus terminated useful execution at this point. Considering this, the actual bug report should be fixed now.

volk_get_index is used in 2 functions:

  1. lib/volk_rank_archs.c:volk_rank_archs: return values propagate here.
  2. tmpl/volk.tmpl.c:186, i.e. in all _manual kernel calls.

volk_rank_archs is used in "one" place.

  • tmpl/volk.tmpl.c:151-152: kernel init. Thus, the first call to any kernel.

Both functions in tmpl/volk.tmpl.c return void. We can't signal errors that way. We'd have to change the API. We go back and forth between signed and unsigned variables in these functions. But that's a separate issue.

We could treat -1 in both functions as: use the first impl in the list and hope it is supported on your system. I'd really like to raise a NotImplementedError but that's obviously not an option either.

I know this is an unsatisfactory but we could ignore this problem for now and add it to the list of issues to fix for our next major release. Would that be a viable option?

jdemel avatar Oct 02 '21 10:10 jdemel

I reverted the change where ! is replaced with != 0. It caused issues with most systems except Ubuntu x86. I rebased to the latest main as well.

jdemel avatar Oct 02 '21 10:10 jdemel

I reverted the change where ! is replaced with != 0. It caused issues with most systems except Ubuntu x86. I rebased to the latest main as well.

interesting ... I wonder why? Anyway, ! is fine

michaelld avatar Oct 02 '21 17:10 michaelld

This PR fixes the direct issue, which is important. It does change the API unfortunately by now returning -1 where it didn't before. I guess as an interim we could instead have it return the first entry (if there is one) & print a warning noting as such. Of course if there is no entry then we'd want to return -1 ... so, does this help? Hmmm ...

michaelld avatar Oct 02 '21 17:10 michaelld

This PR fixes the direct issue, which is important. It does change the API unfortunately by now returning -1 where it didn't before. I guess as an interim we could instead have it return the first entry (if there is one) & print a warning noting as such. Of course if there is no entry then we'd want to return -1 ... so, does this help? Hmmm ...

It sounds like every solution somehow breaks the API. However, at the moment the broken API won't work at all. After this change, it will. If we'd do another 2.x release, I'd argue this fix should go in there. I added a message "Volk ERROR ...". If we'd return 0;, we'd most probably cause an "illegal instruction error".

Besides, it might be time to add a Debian test. However, Debian 'bullseye' ships GCC 10. We already have a test with GCC 10. The question would be: Is this distro specific?

jdemel avatar Oct 03 '21 07:10 jdemel

By the way, how do we figure out what breaks finding that generic implementation? even if this PR was perfect in every way, it would not solve the underlying issue that leads to https://github.com/gnuradio/volk/issues/516 . And maybe fixing that would completely rewrite this part of code, anyway...

marcusmueller avatar Oct 04 '21 14:10 marcusmueller

Updated the initial PR description. I started some tests with:

FROM debian:bullseye

RUN apt-get update \
        && apt-get install -y \
        build-essential git cmake python3-mako python3-distutils liborc-dev

I couldn't find liborc-dev and python3-distutils in the GNU Radio CI Docker file. liborc-dev is essentially a dependency and should be available anyways. python3-distutils is only a build dependency. So it won't be installed with a binary.

Besides, I can't reproduce the issue in my minimal container with the current main. Debian 11 ships VOLK 2.4.1 which I didn't test yet.

jdemel avatar Oct 04 '21 17:10 jdemel

liborc-dev is essentially a dependency and should be available anyways.

liborc-dev should only be a build-time dependency, right?

marcusmueller avatar Oct 04 '21 17:10 marcusmueller

I converted this PR to a draft. At the moment I doubt it really fixes the issue it is supposed to fix. We need a better understanding of the issue here.

jdemel avatar Oct 08 '21 10:10 jdemel

This draft PR was introduced because of a bug that turned out to be outside of VOLK. I'm closing this PR now.

jdemel avatar Sep 15 '23 08:09 jdemel