volk icon indicating copy to clipboard operation
volk copied to clipboard

Building Volk on Android with NEON Support. volk_32fc_convert_16ic_neon does not compile

Open Aang23 opened this issue 1 year ago • 5 comments

In the past, while attempting to build Volk with the Android NDK, NEON detection would not work properly, so I had to patch the CMake manually for it to work, and usually compromise with no NEON support.

Thankfully, this all appears to be fully working now (may also be due to newer NDKs not being as weird...). I was able to build Volk with full SIMD support for all 4 archs, so that's awesome news!

Though, volk_32fc_convert_16ic_neon _VOLK_ASM calls fail at compile-time, so I had to disable the NEON variant for things to work. This is the only kernel that does not build properly with the NDK.

Screenshot from 2022-08-01 14-16-15

Edit : This appears to be an Android-specific issue. Building on another armeabi-v7a machine running a standard GCC-based toolchain works as expected. Maybe just disable building this kernel if __ANDROID__ is set. Personally I'd prefer that over having to patch it.

Aang23 avatar Aug 01 '22 12:08 Aang23

This is a special macro defined here: https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L166-L179

Furthermore, it is only used here:

https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L216-L217

Afterwards an https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L237 follows.

At the moment, we don't have Android CI. Thus, it'd be difficult to maintain a change.

It seems like this change got introduced in https://github.com/gnuradio/volk/commit/d5deb3a5ec2f3744ff48a9fbceb464ccf165f1ef . Before that, this macro didn't exist.

I'd like to narrow down the issue. Could you provide a bit more info?

  • Which Android version do you target? Since this is ARMv7 (or should be at least)
  • Which hardware do you target?
  • Which compiler do you use? Maybe we can improve this section with smth better that wasn't (widely) available 3 years ago.

jdemel avatar Aug 01 '22 12:08 jdemel

If you want, I could PR an Android CI configuration. It's something I already have for other projects so it would not be too hard.

I can try and revert this change, see if that fixes it.

I'm targetting Android API 23, no specific hardware as this application is intended to runs "on anyone's phone". So I'm simply targetting armeabi-v7a, arm64-v8a, x86 and x86_64 as the NDK requires when building an APK utilizing the NDK.

The compiler being utilized is Clang, using the toolchain file provided in commandlinetools-linux-8512546. (Downloaded a few weeks ago)

Aang23 avatar Aug 01 '22 12:08 Aang23

Ok, using the implementation before https://github.com/gnuradio/volk/commit/d5deb3a5ec2f3744ff48a9fbceb464ccf165f1ef builds just fine.

Aang23 avatar Aug 01 '22 12:08 Aang23

@Aang23 PRs are always welcome :smiley:

Just to make sure, only the armeabi-v7a fails? All others compile as expected, e.g. arm64-v8a?

I feel like I just found the ARM NEON intrinsics equivalent of the intel intrinsics guide.

The macro tries to mimic the vcvtaq_s32_f32 intrinsic.

The ARMv8 solution looks simple enough: https://github.com/gnuradio/volk/blob/af69c60e908cd64b3bacd4ee6b4ce4a52b711865/kernels/volk/volk_32fc_convert_16ic.h#L276-L277

More details about this change are discussed in #296 .

jdemel avatar Aug 01 '22 13:08 jdemel

I will do it in the next few days. Won't run tests, but I will make it check it builds on all 4 archs required. (though since tests are being run on all of those in other actions, I wouldn't worry about it)

Only seems to happen on v7a.

Aang23 avatar Aug 01 '22 22:08 Aang23

So, I am considering making a PR to fix this very soon (possibly today). I have tried 2 ways to fix this :

  • Revert https://github.com/gnuradio/volk/commit/d5deb3a5ec2f3744ff48a9fbceb464ccf165f1ef. This solves all build issues on v7a, and at least in the tests I could perform on Android, appears to work (but I'd assume this change had been made for a reason back then).
  • Disable the v7a NEON kernel on Android. That way the current implementation can remain unchanged while letting Android builds functional. This does mean Android will not benefit from the NEON implementation, but it's definitely better than it not working at all (this is the option I've went with personally).

What do you think @jdemel @michaelld ?

Aang23 avatar Aug 16 '22 15:08 Aang23

@Aang23 that commit is for a single kernel: volk_32fc_convert_16ic. Is it a compile issue then, that the change from the commit break compiling? Or is it runtime, that the kernel doesn't pass test? I'm assuming then that other kernels build on Andriod v7a? It would awesome if someone could find a way to fix that kernel to work ...

michaelld avatar Aug 16 '22 15:08 michaelld

@michaelld this kernel is the only one in all of Volk that does not compile with the Android toolchain. The changes in this commit cause the error initially mentionned in this issue. Yes... At least to me it appears it still does fine without those changes, but modiying it without extensive testing is a risk, which is why I personally just disabled it for now.

Aang23 avatar Aug 16 '22 15:08 Aang23

@Aang23 guessing the issue is the macro VCVTRQ_S32_F32, since that's assembly code & thus potentially "volatile" across different NEON architectures. Some internet sleuthing turns up that the =t should really be =r (all 4) ... can you try that & see if that helps? ['t' == "VFP floating-point registers s0-s31. Used for 32 bit values." ; 'r' == "A register operand is allowed provided that it is in a general register." ... and is typically used for 32-bit signed integer values]

michaelld avatar Aug 16 '22 16:08 michaelld

@michaelld Sure, but what =t are you referring to?

#define VCVTRQ_S32_F32(res, val)                \
    __VOLK_ASM("VCVTR.S32.F32 %[r0], %[v0]\n\t" \
               : [r0] "=w"(res[0])              \
               : [v0] "w"(val[0])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r1], %[v1]\n\t" \
               : [r1] "=w"(res[1])              \
               : [v1] "w"(val[1])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r2], %[v2]\n\t" \
               : [r2] "=w"(res[2])              \
               : [v2] "w"(val[2])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r3], %[v3]\n\t" : [r3] "=w"(res[3]) : [v3] "w"(val[3]) :);

Aang23 avatar Aug 16 '22 16:08 Aang23

Just from the commit:

#define VCVTRQ_S32_F32(result, value)					\
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[0]) : "t"(value[0]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[1]) : "t"(value[1]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[2]) : "t"(value[2]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=t"(result[3]) : "t"(value[3]) :);

michaelld avatar Aug 16 '22 16:08 michaelld

I believe should instead be:

#define VCVTRQ_S32_F32(result, value)					\
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[0]) : "t"(value[0]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[1]) : "t"(value[1]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[2]) : "t"(value[2]) :); \
  __VOLK_ASM("VCVTR.S32.F32 %0, %1" : "=r"(result[3]) : "t"(value[3]) :);

michaelld avatar Aug 16 '22 16:08 michaelld

Ok, that's different. The code above (with no changes) compiles. https://github.com/gnuradio/volk/commit/0854b232ff473ca2f7fa99659c5ce9ee871454d4 which included changes to the Macro is the one breaking it.

Aang23 avatar Aug 16 '22 16:08 Aang23

Ah ... gotcha. Well looking at the other macro code, the =w is wrong then. w == "Floating point register, Advanced SIMD vector register or SVE vector register". This is fine to the right-hand argument (w) -- though t should also work & might be the better option since it is 32-bit specific (worth trying IMHO). The left-hand argument is of type S32 (32-bit signed integer), which again should be =r. Can you try the =r first & see if that works? If so, can you also try t instead of w?

michaelld avatar Aug 16 '22 17:08 michaelld

=r does not build, but =t does.

#define VCVTRQ_S32_F32(res, val)                \
    __VOLK_ASM("VCVTR.S32.F32 %[r0], %[v0]\n\t" \
               : [r0] "=t"(res[0])              \
               : [v0] "t"(val[0])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r1], %[v1]\n\t" \
               : [r1] "=t"(res[1])              \
               : [v1] "t"(val[1])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r2], %[v2]\n\t" \
               : [r2] "=t"(res[2])              \
               : [v2] "t"(val[2])               \
               :);                              \
    __VOLK_ASM("VCVTR.S32.F32 %[r3], %[v3]\n\t" : [r3] "=t"(res[3]) : [v3] "t"(val[3]) :);

Aang23 avatar Aug 16 '22 17:08 Aang23

Interesting. Can you try =r and t respectively ... that's really the correct combination.

michaelld avatar Aug 16 '22 17:08 michaelld

That does not compile. Interesting as it appears nearly identical to the previous, working Macro.

Aang23 avatar Aug 16 '22 17:08 Aang23

Strange. If you go with =t and t respectively, it sounds like the build succeeds ... which is progress! does make test succeed? or volk_profile? I'm now curious if that code really works once compiled.

michaelld avatar Aug 16 '22 18:08 michaelld

Took a while to test... Had to root an old Android device first (to test on real HW, and not a VM). Tests passed. volk_profile also worked. If anyone has a "better" machine running armv7a, it might be a good idea to confirm on something else.

Edit : Might have found something else to test on... Will report later.

Aang23 avatar Aug 16 '22 21:08 Aang23

Well ... I'd say to create a PR with that fix & we'll get it in merged. No idea why it works, but the code makes a lot more sense with t than w given AARCH64 GCC machine constraint definitions.

michaelld avatar Aug 16 '22 23:08 michaelld

Ok. I will do a bit more tested, then do a PR. Though perhaps, we could also go back to the initial code? That utilized the correct =r & t, and also passes tests here.

Aang23 avatar Aug 17 '22 11:08 Aang23

Give them all a go & whatever works on Android armv7a I'll be also works for other aarch64 and NEON compilers

michaelld avatar Aug 17 '22 14:08 michaelld

@michaelld see https://github.com/gnuradio/volk/pull/593

Aang23 avatar Aug 17 '22 21:08 Aang23

@Aang23 merged! I think we're done with this issue now. If you agree, go ahead and close it out. Thx for your work!

michaelld avatar Aug 18 '22 13:08 michaelld

@michaelld awesome. I can confirm - the current master branch builds and passes tests fine on Android (and other platforms) just fine now.

I will close this issue, but I'm of the opinion #594 should be worked on still. Having some sort of CI for every archs is pretty important... If the current system doesn't work, I do have some ideas - but I will carry on in the PR.

Thanks!

Aang23 avatar Aug 18 '22 23:08 Aang23