volk sve: Add ARM SVE compile support

This commit adds code to support SVE+SVE2. However, since I don't have any real hardware available, it is mostly guesswork.

Jul 09 '24 21:07 jdemel

Is it really sensible to merge this then? No support would seem to be better than broken support; or am I wrong about that?

Jul 10 '24 14:07 marcusmueller

I wanted to look into support for these extensions because I expect them to be available in all ARMv9 CPUs. I converted the PR to draft. Maybe someone wants to pick up the draft and is able to test it. This might be a good start. It's also the reason why I shared the code in this state.

Jul 11 '24 07:07 jdemel

I have a SVE ARM server available. Just did a simple testing on @jdemel 's branch as of commit 63ca7096affce2cac815ec1c229d74f21de35e35.

This is the build message.

-- Available machines: generic;neon;neonv8;sve;sve2
-- BUILD TYPE = RELEASE
-- Base cflags = -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign
-- BUILD INFO ::: generic ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign
-- BUILD INFO ::: neon ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations
-- BUILD INFO ::: neonv8 ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations -funsafe-math-optimizations
-- BUILD INFO ::: sve ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations -funsafe-math-optimizations -march=armv8-a+sve
-- BUILD INFO ::: sve2 ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations -funsafe-math-optimizations -march=armv8-a+sve -march=armv8-a+sve2

The compiler did some autovectorization. I observed some SVE instructions in /build/lib/libvolk.so.3.1.2. Some snippets I observed are:

   e2280:	d37ff862 	lsl	x2, x3, #1
   e2284:	a422c1c0 	ld2b	{z0.b-z1.b}, p0/z, [x14, x2]
   e2288:	e40341e0 	st1b	{z0.b}, p0, [x15, x3]
   e228c:	0430e3e3 	incb	x3
   e2290:	25260c60 	whilelo	p0.b, w3, w6
   e2294:	54ffff61 	b.ne	e2280 <volk_32f_8u_polarbutterflypuppet_32f_generic+0x1a90>

   b9b40:       6594a800        scvtf   z0.s, p2/m, z0.s
   b9b44:       65b50482        fmla    z2.s, p1/m, z4.s, z21.s
   b9b48:       65b20001        fmla    z1.s, p0/m, z0.s, z18.s
   b9b4c:       65b30002        fmla    z2.s, p0/m, z0.s, z19.s
   b9b50:       8b070042        add     x2, x2, x7
   b9b54:       25631ca0        whilelo p0.h, x5, x3
   b9b58:       54fffe01        b.ne    b9b18 <volk_16i_32fc_dot_prod_32fc_generic+0x318>

make test suggests that 100% tests passed, 0 tests failed out of 148

Sep 20 '25 05:09 wjsota

I have access to an ARM server with SVE support (AWS Graviton3), and have been on the search for a suitable project to do for this ARM Developer lab project.

I would like to try to take a stab at adding support for SVE.. Im looking to use this to learn both about ARM SVE and performance engineering.

Given that it is SIMD just like ARM NEON, I can use reference commit 789fb4d800c1ca738bfcb5a2e76ff4b963df6e49 and this paper written by Nathan West to see how I can add support for SVE. There's also a learning path by ARM on how to port ARM NEON to SVE.

May I have your support to try? As Im somewhat inexperienced, I would like feedback and guidance from you along the way, but I believe I can work mostly independently.

Sep 21 '25 14:09 wjsota

@wjsota your comments slipped through. Thank you very much. If you're still interested, I'd suggest use the code here, add an implementation for something simple, like a multiplication and open a new PR. That'd be something we can discuss.

Dec 24 '25 09:12 jdemel