graphene
graphene copied to clipboard
Alignement fault for graphene_vec3_dot with neon
Experienced behavior
Using graphene at runtime has alignment fault on armv7hl with neon. Tried using Fedora 33/34 workstation on armhfp with a jetson-tk1. Build is passing with/without neon enabled in theses distros. Also tests seems to pass in all cases... (so it looks different than #97 that is related to build issues with others functions).
(*): Shared library is missing debugging information.
#0 0xb590e55c in graphene_vec3_dot () from /lib/libgraphene-1.0.so.0
#1 0xb590c320 in graphene_plane_distance () from /lib/libgraphene-1.0.so.0
#2 0xb590c540 in graphene_frustum_intersects_box () from /lib/libgraphene-1.0.so.0
#3 0xb605fafc in clutter_actor_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#4 0xb605fe14 in clutter_actor_real_paint.lto_priv () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#5 0xb5eaa744 in meta_stage_paint () from /lib/libmutter-8.so.0
#6 0xb60a9834 in clutter_paint_node_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#7 0xb60a984c in clutter_paint_node_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#8 0xb605f5cc in clutter_actor_paint () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#9 0xb60bf84c in clutter_stage_do_paint_view () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#10 0xb5eaa8e4 in meta_stage_paint_view () from /lib/libmutter-8.so.0
#11 0xb60e6d94 in paint_stage.isra () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#12 0xb60dfa04 in clutter_stage_cogl_redraw_view () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#13 0xb5eb3e08 in meta_stage_x11_redraw_view () from /lib/libmutter-8.so.0
#14 0xb60c2868 in handle_frame_clock_frame () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#15 0xb6086bc4 in frame_clock_source_dispatch () from /usr/lib/mutter-8/libmutter-clutter-8.so.0
#16 0xb69c5a7c in g_main_context_dispatch () from /lib/libglib-2.0.so.0
#17 0xb6a2078c in g_main_context_iterate.constprop () from /lib/libglib-2.0.so.0
#18 0xb69c4eb4 in g_main_loop_run () from /lib/libglib-2.0.so.0
#19 0xb5ef0ce4 in meta_run () from /lib/libmutter-8.so.0
#20 0x0049198c in main ()
(gdb) disassemble
Dump of assembler code for function graphene_vec3_dot:
0xb590e554 <+0>: vld1.64 {d18-d19}, [r1 :128]
0xb590e558 <+4>: vld1.64 {d16-d17}, [r0 :128]
=> 0xb590e55c <+8>: vmul.f32 q8, q8, q9
0xb590e560 <+12>: vpadd.f32 d18, d16, d16
0xb590e564 <+16>: vadd.f32 d18, d18, d17
0xb590e568 <+20>: vmov.32 r3, d18[0]
0xb590e56c <+24>: vmov s0, r3
0xb590e570 <+28>: msr SP_hyp, lr, lsl pc
End of assembler dump.
Expected behavior
Fedora workstation should work with wayland at runtime on the device.
Steps to reproduce
install a build of graphene with neon support enabled (current workaround is to disable neon support in graphene).
Operating system in use
Reproduced on Fedora 33/34 armhfp (armv7hl)
SIMD implementation in use
ARM neon
Fedora on ARMv7 distro flags don't enable NEON by default, it should really be a runtime detected optimization as not all arm platforms are guaranteed to have NEON available.
Run time detection does not work with all the inlining going on, so it's a non starter.
Thanks for your answer.
I'm not sure to understand.:
- For the Fedora perspective, we cannot assume neon, so it will have to be disabled (unless using hwcap and alternatives build).
- In the jetson-tk1 case, neon is available. So It should works, but actually fails...
If NEON support is compiled in, then yes: it should work. I only test it on an RPi 3b+, though, because it's the only ARM device I have available.
The dot3_scalar
operator, which is called by the dot3
operator, is implemented as:
float32x4_t __mul = vmulq_f32 (a, b);
float32x2_t __s1 = vpadd_f32 (vget_low_f32 (__m), vget_low_f32 (__m));
float32x2_t __res = vadd_f32 (__s1, vget_high_f32 (__m));
float res = vget_lane_f32 (__res, 0);
Just a random idea: Could it be an LTO issue? @kwizart, have you tried disabling LTO for graphene and see if it helps?
This bug has been playing me for a long time as well, even in Fedora 33 being unable to start gnome-shell due to this.
I'm running it on a Odroid XU4 which properly supports neon but even building with: -mfpu=neon-vfpv4
which is the one for my platform, it will segfault.
Building without neon fixes it here as well.
https://bodhi.fedoraproject.org/updates/FEDORA-2021-f000eb2320 (F33) and https://bodhi.fedoraproject.org/updates/FEDORA-2021-cb9771bb01 (F34) disable neon for Fedora (thanks kwizart!) if you want to test and karma the updates.
Thanks! This will make gnome-shell on F33 and F34 run on armv7 based devices again !
Edit: Removed tests for rpi3/4 as I don't reproduce there on f33 after double checks.
@kalev, I confirm that disabling lto is without effect.
I have encountered a very similar crash, also on an ARMv7l architecture, albeit within graphene_vec3_init()
instead.
Text of alignment trap:
Alignment trap: videosrc_queue: (666) PC=0x660e1a88 Instr=0xf9400aef Address=0x6997a648 FSR 0x811
From GDB:
Thread 18 "videosrc_queue:" received signal SIGBUS, Bus error.
[Switching to Thread 0x5edf3310 (LWP 666)]
0x660e1a88 in graphene_vec3_init (v=v@entry=0x6997a648, x=x@entry=0, y=y@entry=0, z=z@entry=1) at /usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/10.2.0/include/arm_neon.h:10398
10398 /usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/10.2.0/include/arm_neon.h: No such file or directory.
(gdb) bt
#0 0x660e1a88 in graphene_vec3_init (v=v@entry=0x6997a648, x=x@entry=0, y=y@entry=0, z=z@entry=1) at /usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/10.2.0/include/arm_neon.h:10398
#1 0x6611562a in gst_gl_transformation_build_mvp (transformation=0x6997a148) at ../gst-plugins-base-1.16.3/ext/gl/gstgltransformation.c:296
#2 0x6611589c in gst_gl_transformation_set_caps (filter=<optimized out>, incaps=<optimized out>, outcaps=<optimized out>) at ../gst-plugins-base-1.16.3/ext/gl/gstgltransformation.c:487
#3 0x6927e192 in gst_gl_filter_set_caps (bt=0x6997a148, incaps=0x60778590, outcaps=0x60778590) at ../gst-plugins-base-1.16.3/gst-libs/gst/gl/gstglfilter.c:773
#4 0x75737eba in ?? () from /usr/lib/libgstbase-1.0.so.0
(gdb) disas graphene_vec3_init
Dump of assembler code for function graphene_vec3_init:
0x660e1a58 <+0>: push {lr}
0x660e1a5a <+2>: movs r3, #0
0x660e1a5c <+4>: sub sp, #28
0x660e1a5e <+6>: ldr r2, [pc, #64] ; (0x660e1aa0 <graphene_vec3_init+72>)
0x660e1a60 <+8>: str r3, [sp, #16]
0x660e1a62 <+10>: add r3, sp, #4
0x660e1a64 <+12>: vstr s0, [sp, #4]
0x660e1a68 <+16>: add r2, pc
0x660e1a6a <+18>: vstr s1, [sp, #8]
0x660e1a6e <+22>: vstr s2, [sp, #12]
0x660e1a72 <+26>: vld1.32 {d16-d17}, [r3]
0x660e1a76 <+30>: ldr r3, [pc, #44] ; (0x660e1aa4 <graphene_vec3_init+76>)
0x660e1a78 <+32>: ldr r3, [r2, r3]
0x660e1a7a <+34>: ldr r2, [pc, #44] ; (0x660e1aa8 <graphene_vec3_init+80>)
0x660e1a7c <+36>: ldr r3, [r3, #0]
0x660e1a7e <+38>: str r3, [sp, #20]
0x660e1a80 <+40>: mov.w r3, #0
0x660e1a84 <+44>: ldr r3, [pc, #28] ; (0x660e1aa4 <graphene_vec3_init+76>)
0x660e1a86 <+46>: add r2, pc
=> 0x660e1a88 <+48>: vst1.64 {d16-d17}, [r0 :128]
0x660e1a8c <+52>: ldr r3, [r2, r3]
0x660e1a8e <+54>: ldr r2, [r3, #0]
0x660e1a90 <+56>: ldr r3, [sp, #20]
0x660e1a92 <+58>: eors r2, r3
0x660e1a94 <+60>: bne.n 0x660e1a9c <graphene_vec3_init+68>
0x660e1a96 <+62>: add sp, #28
0x660e1a98 <+64>: ldr.w pc, [sp], #4
0x660e1a9c <+68>: blx 0x660d69a8 <__stack_chk_fail@plt>
0x660e1aa0 <+72>: adds r4, #188 ; 0xbc
0x660e1aa2 <+74>: movs r1, r0
0x660e1aa4 <+76>: lsls r4, r1, #2
0x660e1aa6 <+78>: movs r0, r0
0x660e1aa8 <+80>: adds r4, #158 ; 0x9e
0x660e1aaa <+82>: movs r1, r0
End of assembler dump.
(gdb) info registers
r0 0x6997a648 1771546184
r1 0x60778590 1618445712
r2 0x660f4f28 1712279336
r3 0x8c 140
r4 0x6997a148 1771544904
r5 0x5edf1740 1591678784
r6 0x6997a488 1771545736
r7 0x6997a508 1771545864
r8 0x6997a4c8 1771545800
r9 0x5edf1720 1591678752
r10 0x6997a648 1771546184
r11 0x5edf1780 1591678848
r12 0x6613ab20 1712565024
sp 0x5edf16f0 0x5edf16f0
lr 0x6611562b 1712412203
pc 0x660e1a88 0x660e1a88 <graphene_vec3_init+48>
cpsr 0x600b0030 1611333680
fpscr 0x28000013 671088659
If my read is correct, it looks like the code within graphene_vec3_init()
is assuming 16-byte alignment for the value
member within v
, but as we see from the address (0x6997a648
) it's actually only aligned to an 8-byte boundary. The function is invoked from here in GStreamer.
Happy to file as a separate issue instead.
I can also confirm @jeremy-hiatt segfault when the fuction is called from gstreamer. But this one also happens when building without neon.
Apologies, the error @jeremy-hiatt was referring to, isn't happening when graphene has been built without neon.
In 1.10.5, released on February 9, I landed a couple of changes to the alignment annotations; looking at the links on Bodhi that @kalev posted, it seems F33 and F34 still use 1.10.4. Would it be possible for people on ARM to test 1.10.5?
Hm, I only see 1.10.4 on both https://github.com/ebassi/graphene/releases and https://github.com/ebassi/graphene/tags
@kalev You're absolutely right—I got confused with the post-release version bump.
So I'm really confused as to why there are alignment issues, considering that the structures are marked for alignment as required by vectorised types.
@ebassi One thing I'll admit I don't understand very well is how the alignment annotations are expected to interact with struct
nesting. The offending graphene_vec3_t
instance is declared as the camera_position
member within the GstGLTransformation
type here. We can see from the stack trace I posted that the base address for that structure was 0x6997a148
; i.e. the camera_position
was 16-byte aligned with respect to its parent (offset of 0x500
), but that allocation wasn't on a 16-byte alignment, therefore pushing camera_position
off alignment as well.
The part I'm really fuzzy on is whether there's supposed to be anything within GStreamer or GLib in general that's smart enough to allocate heap memory with the correct alignment for this type, and if so, why it's not working here.
I tested with 1.10.4 (was previously using 1.10.2) but no difference. I checked the reported alignment via G_ALIGNOF(GstGLTransformation)
and it looks like it's getting calcluated correctly, since it's returning a value of 16. Looking through the GLib code though, I can't find any evidence that it cares at all about this alignment value when it comes to heap allocations made via g_object_new()
/g_type_create_instance()
.
I posted a pretty minimal example that triggers the alignment fault reliably on my board. Perhaps @ebassi you can try this on your RPi to see if it works for you?
There are no special alignment guarantees for heap-allocated memory via malloc(3)
, the compiler can only make such guarantees for stack allocations.
Structures that may be placed on the heap must be able to manage their own alignment by padding themselves and applying an offset as part of their API according to their given address.
This indeed complictes things, like when such structures are manually copied from one heap-allocated address to another where they require a different alignment. It is advised to provide a "copy" API that properly realigns the content of the structure to the destination address.
There may be other tricks to deal with this requirement, but I'm personally not familiar with them.
I run into the same alignment trap like @jeremy-hiatt, tested with graphene 1.9.2 as well as 1.10.6 on an ARM Cortex A9.
I posted a pretty minimal example that triggers the alignment fault reliably on my board.
Yes, same here.
After investigating the issue pointed out by @jeremy-hiatt further, I am now of the opinion that gstreamer should pass 16-byte-aligned memory to graphene. To do this, I created this gstreamer merge request gltransformation: pass 16-byte-aligned memory to graphene. Feedback welcome!
I'm also getting alignment faults in more then one function. I'm trying to get archlinuxarm gnome running on a rk3288 box.
If you need more info then tell me what I can send.
Seems like a related issue is causing problems with GTK4 (upstream issue); there we're getting SIGBUS errors on Alpine Linux on armv7 and armhf, and the exact function that crashes is graphene_vec4_init_from_vec4
(called from gsk_color_matrix_node_new
) - disabling neon fixes it. Does GTK need a similar patch to what was done with GStreamer?
No, we're not going to allocate pointers to graphene structures in GTK.
The main problem is alignment of GObject: https://gitlab.gnome.org/GNOME/glib/-/issues/1231