volk
volk copied to clipboard
Arctan avx512
Added AVX512 kernels and some minor cleanup.
Using AVX512F yields 40% speedup over the AVX2_FMA implementation on my 7950X3D. Compared to the generic atan2 implementation this is a 65x speedup.
magnus@r7950x3d:~/src/kazam/volk/build$ volk_profile -R atan
RUN_VOLK_TESTS: volk_32f_atan_32f(131071,1987)
generic completed in 2010.97 ms
polynomial completed in 62.8946 ms
a_avx512 completed in 40.8423 ms
a_avx2_fma completed in 56.9026 ms
a_avx2 completed in 55.9691 ms
a_sse4_1 completed in 110.292 ms
u_avx512 completed in 39.5152 ms
u_avx2_fma completed in 55.4739 ms
u_avx2 completed in 55.6364 ms
u_sse4_1 completed in 110.009 ms
Best aligned arch: u_avx512
Best unaligned arch: u_avx512
RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f(131071,1987)
------> generic completed in 4199.28 ms
polynomial completed in 95.92 ms
------> a_avx512 completed in 64.0566 ms
a_avx2_fma completed in 99.0502 ms
a_avx2 completed in 98.3313 ms
------> u_avx512 completed in 63.4753 ms
u_avx2_fma completed in 98.3834 ms
u_avx2 completed in 98.6633 ms
Best aligned arch: u_avx512
Best unaligned arch: u_avx512
Writing /home/magnus/.volk/volk_config...
I just noticed there's a NaN test as well...
https://github.com/gnuradio/volk/pull/731
Need to update this PR with this as well for AVX512!
@Ka-zam #731 is essential for airspy-fmradion, and I've spent a few weeks solving the NaN issue. Please add the NaN test before completing your implementation.
@Ka-zam #731 is essential for airspy-fmradion, and I've spent a few weeks solving the NaN issue. Please add the NaN test before completing your implementation.
I think it's already in there and should work fine! Wrote a test program and
atan2(0.f, 0.f) == 0.f
for all kernels.
Here's some special values and a sanity check:
magnus@r7950x3d:~/src/kazam/scratch$ ./a.out
y : 1.00 -1.00 1.00 -1.00 nan nan 0.00 -0.00 1.00 -1.00
x : 1.00 1.00 -1.00 -1.00 1.00 nan 0.00 0.00 0.00 0.00
atan2(y, x):
generic : 0.79 -0.79 2.36 -2.36 nan nan 0.00 -0.00 1.57 -1.57
polynomial : 0.79 -0.79 2.36 -2.36 nan nan 0.00 -0.00 1.57 -1.57
a_avx512dq : 0.79 -0.79 2.36 -2.36 0.00 0.00 0.00 0.00 1.57 -1.57
Do we care about the sign of atan2(-0, 0)? What about propagating nan?