wl-clipboard icon indicating copy to clipboard operation
wl-clipboard copied to clipboard

--watch gives up after a while

Open OJFord opened this issue 2 years ago • 12 comments

I run a systemd user service to keep my primary selection and clipboard in sync; after a while it stops working, manifesting to me as 'I can't paste anything [my selections]'.

When it stops working the status looks like:

     CGroup: /user.slice/user-1000.slice/[email protected]/app.slice/primary-clipboard-sync.service
             ├─ 56846 /usr/bin/wl-paste --primary --watch /usr/bin/wl-copy
             ├─ 70710 /usr/bin/wl-copy
             ├─ 70713 /usr/bin/wl-copy
             └─ 70716 cat

Which seems like an 'extra' wl-copy and cat. Restarting the service resolves the issue, and I just now found that killing the cat process works too.

Let me know if there's anything I can try (when it next occurs) to be more helpful, I realise that's not a lot to go on.

OJFord avatar May 18 '22 21:05 OJFord

Hm, that's odd. What Wayland compositor are you using? Could you try reproducing it on another compositor? Could you try reproducing it by spamming wl-copy and wl-paste perhaps? If you are able to make a consistent reproducer, then WAYLAND_DEBUG=1 logs for the wl-clipboard processes would be helpful.

YaLTeR avatar May 22 '22 13:05 YaLTeR

I'm using sway; I've never used anything else but I can give it a go.. imagine I can just install and launch one from within sway (since I realised accidentally recently that sway will happily start a new sway instance within itself (🤯))?

I'll see if spamming reproduces.

I'll also just add that to the env for systemd service, since it 'reliably' reproduces a few times daily, thanks.

OJFord avatar May 22 '22 19:05 OJFord

imagine I can just install and launch one from within sway

Yeah, you can do that with weston and also with mutter --nested. Trying other wlroots-based compositors is probably gonna yield the same result, so I'd test weston and mutter.

YaLTeR avatar May 22 '22 19:05 YaLTeR

Ok here's the WAYLAND_DEBUG=1 log: failed.log

15:36 is where I noticed it not working; so I assume whatever happened at 15:29 worked as expected.

Status looks as in OP, just different PIDs obviously. wl-clipboard 2.1.0.

OJFord avatar May 24 '22 14:05 OJFord

It seems this might be triggered by certain paste 'targets' (I don't how you'd term it - the focussed area when I paste, via XF86Paste). ... Which seems odd to me, and perhaps isn't the case, since that seems unrelated to watching for copies and pasting them by a different mechanism in the background.

I just had some very uncanny timing where I pasted something into a chat support text field (which itself necessitated restarting the service) and then went to select ('primary selection') something else to paste (from clipboard) and it had already stopped working, even though I'd only just restarted it.

OJFord avatar May 27 '22 22:05 OJFord

Did you have a chance to test other compositors? Might be an issue in wlroots' Xwayland integration.

YaLTeR avatar Jun 12 '22 16:06 YaLTeR

I've also been experiencing this issue a few times each day for a while, but I haven't noticed a pattern of when it fails. Did anyone find a reliable way of reproducing this? Happy to look into other compositors if so.

eddsalkield avatar Aug 29 '22 13:08 eddsalkield

Sorry, I haven't yet tried another compositor, I keep meaning to, I will get to it, but any help appreciated.

It does seem to be very reproducible with an app of my own that uses Tauri - I paste, and I need to restart, can't paste again, I get one paste per restart within the app.

It also seems to affect Firefox URL bar but only occasionally in-page elements.

OJFord avatar Aug 29 '22 21:08 OJFord

Hmm interesting, good to know it's a bit reproducible this way, but it would be nice to get to a point where we have a minimal example script that reproduces the behaviour.

eddsalkield avatar Aug 29 '22 23:08 eddsalkield

Agreed of course, I just mentioned it in case it made someone think 'ah sounds like they're both foobars' - difference between Firefox URL bar and page is interesting in particular I think.

Also because, re:

Might be an issue in wlroots' Xwayland integration.

Firefox isn't using Xwayland.

OJFord avatar Sep 08 '22 07:09 OJFord

Just had a look at reproducing with weston or mutter @YaLTeR - both error with --watch that it requires a compositor that supports wlroots data control protocol. Additionally weston errors with --primary that it does not seem to support primary selection.

Anything else I can try?

OJFord avatar Sep 08 '22 12:09 OJFord

both error with --watch that it requires a compositor that supports wlroots data control protocol

Oh, right, forgot about that. Uhh, not sure then actually, do Smithay-based compositors support wlr-data-control?

YaLTeR avatar Sep 08 '22 15:09 YaLTeR

As far as I can tell (grepping the repo), no, but I haven't tested it because the only public Smithay-based compositor I can find isn't anywhere near complete enough to make that straightforward.

OJFord avatar Sep 22 '22 11:09 OJFord

Anything else I can try here? I'm not sure why it's labelled 'question', seems pretty clearly a bug somewhere (even if not in wl-clipboard) or buggy interaction between things to me?

OJFord avatar Nov 05 '22 00:11 OJFord

Anything else I can try here?

Not sure, sorry. I kinda hope @bugaevc will have a better idea here once he's back (which should be relatively soon hopefully).

I'm not sure why it's labelled 'question', seems pretty clearly a bug somewhere

Well, a bug somewhere, and we're not sure where, hence the question.

YaLTeR avatar Nov 05 '22 01:11 YaLTeR

I can consistently reproduce this in sway 1.8 with qutebrowser 2.5.2 by taking a long time to finish my selection with the mouse. That is, push the mouse button down, move back and forth for a second or so, then release the mouse button. With export WAYLAND_DEBUG=1; wl-paste -pw wl-copy and every application aside from qutebrowser, I don't get any output while moving the mouse around, followed by a bunch when I release it. With qutebrowser, there's constant output while I'm moving the mouse. The only other Qt/python application I have is calibre, and it does not reproduce this behavior.

Logs attached. failed.log success.log

jsbronder avatar Jan 12 '23 19:01 jsbronder

I kinda hope @bugaevc will have a better idea here once he's back

Hi all! and sorry for taking this long to look at this.

I can consistently reproduce this in sway 1.8 with qutebrowser 2.5.2 by taking a long time to finish my selection with the mouse. That is, push the mouse button down, move back and forth for a second or so, then release the mouse button

Thank you, I was able to reproduce that!

             ├─ 56846 /usr/bin/wl-paste --primary --watch /usr/bin/wl-copy
             ├─ 70710 /usr/bin/wl-copy
             ├─ 70713 /usr/bin/wl-copy
             └─ 70716 cat
Which seems like an 'extra' wl-copy and cat.

So the fact that you're seeing two instances of wl-copy is "normal", as in it itself is not a bug. The reason for this is that while wl-copy invocation seems to complete/exit instantly, in reality wl-copy forks off into the background and "serves" the copy requests from other clients; it automatically quits once it sees a different data offer being set as the selection (so, something else having been copied). You can see this for yourself if you run it with --foreground, in which case it will do the same things but without actually forking into background. (Don't try to use --foreground when running wl-copy from wl-paste --watch though; that will break/deadlock.)

And so, if you run wl-paste -pw wl-copy, what should normally happen is:

  1. There is an instance of wl-copy in the background, serving copy requests;
  2. Something is copied into the primary clipboard;
  3. wl-paste -pw spawns a new instance of wl-copy;
  4. That new instance sets its source as the new selection;
  5. The old instance of wl-copy sees this and quits;
  6. GOTO 1.

In your case, we're stuck at step 3: there is the old wl-copy and the new wl-copy (compare the PIDs to figure out which one is which), but we're not proceeding to step 4 for some reason. The cat child you're seeing is the cat that wl-copy spawns during startup to copy its stdin into a temp file (see dump_stdin_into_a_temp_file()). This cat evidently does not complete, and so the new wl-copy does not proceed further (to step 4).

Why does cat not complete? Likely, because the other end of the pipe never got closed. I have not investigated why, but I can speculate that is because we're hitting some unusual situation inside either the compositor (Sway/wlroots) or the client (qutebrowser).

Interestingly, wl-paste itself is in a weird state: namely, in a recursive/reentrant invocation of selection_callback() because of wl_display_roundtrip(). There was a related fix recently, https://github.com/bugaevc/wl-clipboard/pull/133 (cc'ing @mahkoh), but that (intentionally) did not affect the watch mode. This, first of all, explains why this bug only shows up sometimes: there's a race condition that we have to hit for the reentrancy to trigger. And it sounds plausible that us not destroying the previous offer (in the outer invocation of selection_callback()) while asking for the new offer's data (in the nested invocation of selection_callback()) messes with the expectations of qutebrowser, causing it to not send the data for the new offer properly.

The following tentative patch, which prevents dispatching events in the middle of selection_callback(), seems to solve the issue for me:

diff --git a/src/wl-paste.c b/src/wl-paste.c
index e384216..2c702cc 100644
--- a/src/wl-paste.c
+++ b/src/wl-paste.c
@@ -238,7 +238,8 @@ static void selection_callback(struct offer *offer, int primary) {
         popup_surface_destroy(popup_surface);
         popup_surface = NULL;
     }
-    wl_display_roundtrip(wl_display);
+    wl_display_flush(wl_display);

     /* Spawn a cat to perform the copy.
      * If watch mode is active, we spawn

@OJFord @jsbronder could you please test whether that works for you?

That being said, I'm not sure this patch is good:

  • it does not handle EAGAIN, we should do something akin to wl_display_poll() in libwayland-client — but then would that really work if we've written so much stuff that the socket buffer is full, yet we're not reading any incoming messages?
  • maybe instead of trying to prevent reentrancy by not dispatching any events, we should do something to handle the reentrancy better; such as maybe detecting it and destroying the previous offer while signaling the outer invocation that this has happened.

bugaevc avatar Jan 17 '23 12:01 bugaevc

Thank you for the detailed reply @bugaevc, I'll certainly give that a go.

Thanks also for the reproduction @jsbronder! I appreciate it's not a lot to go on without a reasonable way of reproducing, I just wasn't able to find a contained way of doing so, so I'm very glad you did.

OJFord avatar Jan 17 '23 13:01 OJFord

@bugaevc I reproduced the bug again having built from master at time of writing, d83a629822d83b73ffa0f2a6e1c82b1caab30b8d, and then applied your patch above and rebuilt. I don't want to speak to soon, but that seems to fix it for me too, thanks!

OJFord avatar Jan 18 '23 23:01 OJFord

The following tentative patch, which prevents dispatching events in the middle of selection_callback(), seems to solve the issue for me:

diff --git a/src/wl-paste.c b/src/wl-paste.c
index e384216..2c702cc 100644
--- a/src/wl-paste.c
+++ b/src/wl-paste.c
@@ -238,7 +238,8 @@ static void selection_callback(struct offer *offer, int primary) {
         popup_surface_destroy(popup_surface);
         popup_surface = NULL;
     }
-    wl_display_roundtrip(wl_display);
+    wl_display_flush(wl_display);

     /* Spawn a cat to perform the copy.
      * If watch mode is active, we spawn

@OJFord @jsbronder could you please test whether that works for you?

Sorry for the delayed response, I was on holiday. I ran with the above on d83a629822d83b73ffa0f2a6e1c82b1caab30b8d all day today, occasionally comparing both clipboards and they do appear to have stayed synced.

jsbronder avatar Jan 24 '23 02:01 jsbronder

@bugaevc your patch (applied to the latest Debian package 2.1.0-0.1) also solves this issue for me (sway 1.8.1 on Debian Unstable)

hvhaugwitz avatar Feb 22 '23 20:02 hvhaugwitz

Thanks everyone for testing! :slightly_smiling_face: Since there were no regressions reported, I went ahead and committed the patch.

bugaevc avatar Mar 04 '23 19:03 bugaevc