labwc icon indicating copy to clipboard operation
labwc copied to clipboard

Repeatable Xwayland crash ("request could not be marshaled")

Open jlindgren90 opened this issue 2 years ago • 4 comments

I'm not sure if this is an Xwayland, wlroots, or labwc issue -- possibly all three.

Versions

wayland 1.20.0 xorg-xwayland 22.1.0 wlroots git 511f137f (2022-02-07) labwc git edc73550 (wlroots-git branch, 2022-02-22)

Steps to reproduce

  • Start labwc
  • Run "GDK_BACKEND=x11 geany"
  • Continuously resize Geany's window for >30 seconds

Expected results

  • Xwayland does not crash

Actual results

  • Xwayland crashes with the following error
(EE)
Fatal server error:
(EE) request could not be marshaled: can't send file descriptor
(EE)

Additional information

The crash can be reproduced much more quickly if labwc's execution is slowed down by running it in valgrind.

The error message comes from libwayland-client, which Xwayland uses.

The immediate cause of the crash appears to be that the sendmsg() call in wl_connection_flush() results in EAGAIN, and Xwayland treats the error as fatal.

As an experiment, I removed the MSG_DONTWAIT flag from the sendmsg() call. Then a deadlock results between labwc and Xwayland, which both appear to be blocked waiting for a response from the other.

I have no clue how to even start fixing this. The whole architecture seems broken if we have two single-threaded processes (the Wayland compositor and the X server) that both need a response from the other in order to proceed without crashing.

Thread 1 (Thread 0x7f3330969e00 (LWP 52068) "labwc"):
#0  0x00007f33338642af in poll () at /usr/lib/libc.so.6
#1  0x00007f333364263b in  () at /usr/lib/libxcb.so.1
#2  0x00007f333364408f in  () at /usr/lib/libxcb.so.1
#3  0x00007f33336441a2 in xcb_wait_for_reply () at /usr/lib/libxcb.so.1
#4  0x00007f33340dbed3 in read_surface_property (xwm=0x564a3b81a050, xsurface=0x564a3b82f9d0, property=349) at ../xwayland/xwm.c:800
#5  0x00007f33340df1e1 in xwm_handle_property_notify (ev=0x564a3b5b4fa0, xwm=0x564a3b81a050) at ../xwayland/xwm.c:1121
#6  x11_event_handler (fd=<optimized out>, mask=<optimized out>, data=<optimized out>) at ../xwayland/xwm.c:1637
#7  x11_event_handler (fd=<optimized out>, mask=<optimized out>, data=<optimized out>) at ../xwayland/xwm.c:1589
#8  0x00007f3334130297 in post_dispatch_check (loop=0x564a3b44eb90) at ../wayland-1.20.0/src/event-loop.c:943
#9  wl_event_loop_dispatch (loop=0x564a3b44eb90, timeout=timeout@entry=-1) at ../wayland-1.20.0/src/event-loop.c:1034
#10 0x00007f333412dd37 in wl_display_run (display=0x564a3b4689d0) at ../wayland-1.20.0/src/wayland-server.c:1408
#11 0x0000564a39a61df4 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:82

Thread 1 (Thread 0x7fca124be900 (LWP 52103) "Xwayland"):
#0  0x00007fca130a0437 in sendmsg () from /usr/lib/libc.so.6
#1  0x00007fca134ee9cf in wl_connection_flush (connection=connection@entry=0x558339081c50) at ../wayland-1.20.0/src/connection.c:315
#2  0x00007fca134ef3c2 in wl_connection_flush (connection=0x558339081c50) at ../wayland-1.20.0/src/connection.c:297
#3  wl_connection_put_fd (fd=<optimized out>, connection=0x558339081c50) at ../wayland-1.20.0/src/connection.c:434
#4  copy_fds_to_connection (closure=closure@entry=0x5583394723e0, connection=connection@entry=0x558339081c50) at ../wayland-1.20.0/src/connection.c:1057
#5  0x00007fca134f04c3 in wl_closure_send (closure=0x5583394723e0, connection=0x558339081c50) at ../wayland-1.20.0/src/connection.c:1217
#6  0x00007fca134ecc16 in wl_proxy_marshal_array_flags (proxy=proxy@entry=0x558339087e50, opcode=opcode@entry=0, interface=interface@entry=0x5583371b3820 <wl_shm_pool_interface>, version=version@entry=1, flags=flags@entry=0, args=args@entry=0x7fff59d69f80) at ../wayland-1.20.0/src/wayland-client.c:852
#7  0x00007fca134ecea0 in wl_proxy_marshal_flags (proxy=proxy@entry=0x558339087e50, opcode=opcode@entry=0, interface=0x5583371b3820 <wl_shm_pool_interface>, version=1, flags=flags@entry=0) at ../wayland-1.20.0/src/wayland-client.c:784
#8  0x0000558336fdeef2 in wl_shm_create_pool (size=2304, fd=7, wl_shm=0x558339087e50) at /usr/include/wayland-client-protocol.h:1932
#9  xwl_shm_create_pixmap (screen=<optimized out>, width=<optimized out>, height=<optimized out>, depth=32, hint=<optimized out>) at ../xwayland-22.1.0/hw/xwayland/xwayland-shm.c:271
#10 0x000055833703db9e in ProcCreatePixmap (client=0x5583393eadd0) at ../xwayland-22.1.0/dix/dispatch.c:1508
#11 0x00005583370461be in Dispatch () at ../xwayland-22.1.0/dix/dispatch.c:550
#12 0x0000558336fd4250 in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../xwayland-22.1.0/dix/main.c:271
#13 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../xwayland-22.1.0/dix/stubmain.c:34

jlindgren90 avatar Feb 22 '22 09:02 jlindgren90

weston suffers from the same issue, https://gitlab.freedesktop.org/wayland/weston/-/issues/589

jlindgren90 avatar Feb 22 '22 23:02 jlindgren90

Interesting point regarding xwayland causing FD exhaustion in that bug report. Guess we should also try to set the FD limit to the hard limit when running XWayland.

Consolatis avatar Feb 24 '22 20:02 Consolatis

Something like https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/213 is required to fix this, in addition to the ulimit -n increase.

jlindgren90 avatar Feb 27 '22 21:02 jlindgren90

Nice patch :)

johanmalm avatar Feb 27 '22 22:02 johanmalm

Is this still a thing or can we close the issue?

We increased the file limit in 6c2bbb42ea612831da826cd80fdf71e8ec7c6079 and 722aa042b7a270d8da3f2a7c78c77dd59d67eb9b.

Consolatis avatar Dec 20 '22 19:12 Consolatis

I have not re-checked just now, but I think this can still occur unless something like https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/276 is applied to libwayland.

jlindgren90 avatar Dec 22 '22 22:12 jlindgren90

I cannot reproduce this any more. Many things have changed since (switched from Nvidia to AMD GPU, wayland+xwayland updates, wlroots+labwc updates) so it's hard to identify what may have fixed it.

With XWAYLAND_NO_GLAMOR=1 (an uncommon case anyway), I can still get the Xwayland process to hang using 100% CPU for a long time (>5 minutes) but it doesn't seem to crash.

jlindgren90 avatar Dec 01 '23 17:12 jlindgren90