notify-send.sh icon indicating copy to clipboard operation
notify-send.sh copied to clipboard

several things...

Open bkw777 opened this issue 5 years ago • 11 comments

I had a few problems with this script, and ended up resolving them, and in the process did a complete re-write, or refactor more exactly.

I don't think you want my stylistic chnages (but I do) so I don't have a pull-request, however, I re-created a fresh git commit history in a fork where I tried to do most of the stylistic stuff in the first couple commits and then the rest of the commits show the functional changes more clearly.

You may or may not be interested, but here is the fork with a commit history that should be more or less readable so you can see how I got rid of all the seds and other unneeded external tools like bc and dc etc, and resolved my main problem which was background processes that just accumulated forever because of not really handling the conditions that can happen with notifications.

I tried to make each commit add one conceptual change. Not always perfectly but pretty close mostly. You could also just look at the final/current version and probably get it. https://github.com/bkw777/notify-send.sh

This version is now working well and bulletproof for me.

Two things I did which I want in my copy that you may not want are, 1 - I execute the action command directly, not with "bash -c". If I really need to execute a command that needs bash interpretation, like redirection, I'll just run bash as the command, and bonus, it doesn't have to be bash. Otherwise, the 99% common case is running a binary or script with args, and I'd rather optimize for that, no unnecessary extra shell process. 2 - I removed the use of printf %q on the action command in send.sh. I haven't missed it, and I think it was the cause of a cosmetic ugliness where the -d command or any -o command with empty button label, produced a button that looked like [ ' ' ] (I undertstand that could be dependat on the particular notification daemon, still, I no longer have that problem. Now if I specify and emty label, I get a clean blank button.) However, I haven't tried every crazy possible command string, just some basics that do include spaces and embedded escaped quotes etc, and they all worked. It's possible that without the %q there are command strings that used to work that now don't.

bkw777 avatar May 25 '20 19:05 bkw777

@bkw777 I'm sorry for a delay. Thank you for the write-up, definitely interesting.

Would you like to elaborate on the conditions preventing cleanup of background processes in your environment and how you have resolved the issue?

vlevit avatar Jun 27 '20 19:06 vlevit

This: https://github.com/bkw777/notify-send.sh

Says: For a lengthy description, see: https://github.com/bkw777/mainline/blob/master/lib/notify_send/mainline_changes.txt

Which includes: Changes to notify-send.sh ...

  • "setsid" to free ourself from the action handler and explicit close child processes, so that we can exit and leave one less unneccessary background process waiting around the whole time the dbus monitor process is waiting for a ActionInvoked or NotificationClosed event, which may never come.

and

Changes to notify-action.sh ...

  • "setsid" to free ourself from the invoked action process so we can exit ... There is a bit to explain about the temp file and gdbus child process handling. The two sections, "kill obsolete monitors" and "kill current monitor" must orchestrate with each other, and with other instances of ourself. Basically it's normal for our gdbus process to be killed from outside by another instance of ourself, and it's likewise normal for us to have to kill other instances of ourself.

=== kill obsolete monitors ===

This has to orchestrate with the "kill current monitor" section. It's not an error to find one or more old pid files, with or without matching processes still active.

Notification events are not predictable, so when setting up a monitor to catch an event (button press or notification closed), you never know how long you'll wait for the event, or if the event will ever come.

You may never get a button press, you may never get a notification-closed, the notification may never expire.

The converse is also true. The notification may be re-opened from the notification daemon's history, and so both action events and on-close events can be sent multiple times, and sent after we have exited and are no longer around to respond to them. Different desktop environments have different notification daemon programs which all behave differently from each other in various ways.

And the --replace-id and --replace-file options for sending a new notification to update an existing notification means that often, one or more of these existing monitor processes is now "obsolete" because we are replacing it. But they have no way to know that their job is done and should exit. They will never end gracefully on their own, because they are waiting for something that never came, and now never will come (a dbus event matching either a button press or a notification-close for the notification ID they are watching). So every time we start up, before starting the new/current monitor, we need to find any obsolete monitors and close them.

"Obsolete" is defined as matching all of the following criteria:

  • Pid file with the same APP_NAME as us

  • Older than ourself

    Using [[ file -ot file ]] to compare relative ages of our pid files to get reliable comparison even for sub-second differences, without needing to record and compare timestamps and without needing to spwan a child just to run "date" just to get sub-second times which you can't get from bash's built-in printf %()T

  • From the same desktop session

    Using $DISPLAY from the pid file to compare with ours.

    This avoids clobbering processes form another session. IE, you can have multiple desktop sessions on multiple virtual consoles, or remote desktop sessions, all for the same user on the same machine at the same time. All concurrent sessions will have different $DISPLAY values. So if we record $DISPLAY in the pid file we can safely tell whether it's OK to consider killing the PID.

    DISPLAY values are like PIDs in that they are only unique at any given moment. They get re-used and are not unique over time. That means it's possible to pick up a pid file with a matching $DISPLAY, but really the pid is from some other past session rather than our current one. That's OK. Any monitor from a logged out session is automatically and by definition obsolete because it's an orphan.

  • Watching the same notification ID we are about to watch.

    The notification ID is recorded in the pid file along with PID and DISPLAY

If all of those are true at the same time for a given PID, then that's a monitor we should kill because we are it's replacement.

And when we kill it, remove the pid file before killing the PID.

The pid is for a child gdbus process, not it's parent notify-action.sh bash process.

When we kill a gdbus process, it's parent notify-action.sh breaks out of it's "while loop" and proceeds to try to do it's own clean-up, the same as if it had concluded normally by processing an event. But we have already killed the PID that it would normally expect to kill. That PID either no longer exists, or if it exists, it's some other unrelated process that we should NOT kill.

So here, remove the pid file before killing, so that it's impossible for the related notify-action.sh to pick up an invalid PID and try to kill it.

The first [[ -s file ]] catches the case when the globbing pattern didn't match any files and the variable contains the globbing pattern itself.

=== kill current monitor ===

This is already described above, but just to go over it from this parts point of view...

This has to orchestrate with the "kill obsolete monitors" section. It's not an error if the pid file we know we wrote is missing.

While we have been waiting for a dbus event which never came, another new instance of ourself may have started up, found our pid file, and killed our gdbus child process. When that happens, the while loop above ends and we fall through to here, and the pid file we wrote will be missing. This is intentional. When the gdbus process is killed, the PID we wrote into the pid file is no longer valid and we should NOT just indiscriminately try to kill that PID any more, because some other random unrelated process may now occupy that PID. So during "kill obsolete", we remove the pid file before killing the pid, so that it's impossible to pick up an invalid PID here. So here, if the file doesn't exist, just exit without error.


So the two main things are the use of setsid just makes it so that when a parent script runs a child script in the background, the parent can actually fully exit and the child neither prevents the parent proc from exiting, nor spams the parents tty with any output. It's basically a more thorough "nohup foo &" Which was mostly a cosmetic issue.

And the bigger thing was all the stuff about killing obsolete monitors.

bkw777 avatar Jun 27 '20 20:06 bkw777

--snip-- "setsid" to free ourself from the action handler and explicit close child processes, so that we can exit --snip--

"explict close child process" looks unclear now. I changed the name/description of the --force feature to "explicit-close". So that line refers to the child process we launch to perform the explicit-close after the specified timeout.

Reworded:

Use "setsid" to free ourself from the child processes for the action handler and the explicit-close, so that we can exit and leave one less unneccessary background process waiting around the whole time the dbus monitor process is waiting for a ActionInvoked or NotificationClosed event, which may never come.

bkw777 avatar Jun 27 '20 20:06 bkw777

Without that kill-obsolete-monitor setup, it was creating a new background process every time I updated the notification (using an ID to update an already-existing notification). And even if the button press action ever comes, it would only be processed by the most recent action handler. All the previous action handlers would just wait in the bg forever. And without the setsid, it was creating 2 bg processes for every notification update not just one.

bkw777 avatar Jun 27 '20 20:06 bkw777

Thanks again for elaborating, much appreciated. To give some context the current notify-send.sh behavior has been tested against GNOME and for me the background processes cleanup I think always worked: either with button click, close event, click on area of notification (default action), or "clear all events" (NotificationClosed is sent for all notifications regardless of timeouts). I will try to have a closer look at your more robust solution and I might borrow some your ideas for the the next version of notify-send.sh.

vlevit avatar Jun 28 '20 20:06 vlevit

Excuse me barging in, but I took the liberty of looking at the script and reworking it a bit myself. My intent was slightly different: to make it POSIX-compliant. As such, it can be run under a shell such as, say, dash, which could potentially yield better performance.

As is, it is currently under my dotfiles, but unlike @bkw777's wonderful efforts, it lacks a history... I just hacked away at it for a while until I was satisfied. Nonetheless, I am confident it is still very near and dear to the original notify-send and as a drop-in replacement to it as is to the original script.

There are still some tools being used in it, such as tr for hint format validation (because I'm lazy) and od for action key random number generation (polling from /dev/urandom). On this last point, I haven't worked much on the actions front since it isn't currently a primary concern since I don't make use of them. Skimming through the issue a tad bit, I liked @bkw777's take on it, might investigate further. However, if it works, it'll probably stay as is for now.

If POSIX compliance is a concern, might wanna take a look at it. If not, maybe for curiosity's sake. Just felt that, after some trials and tribulations, thought I'd share. Which leads me to conclude, after all this time, that I'm very thankful for your script! Definitely prefer it over the python variant.

Addendum: I didn't see support in any of the scripts for the boolean type, which is a part of the Desktop Notification Spec since 1.2. It was trivially supported, value just falls through to gdbus and we pray :crossed_fingers:.

rgcv avatar Aug 12 '20 23:08 rgcv

@rgcv Thanks for sharing and kind words. I can definitely see the value for shell script to be POSIX-compliant, but I don't have experience with it so unless someone joins me to co-maintain it I will keep the script bash-only. But I am more than happy to provide links to your and @bkw777's forks in README if you wish (feel free to open PR). Good catch for boolean type, we should definitely add it.

vlevit avatar Aug 14 '20 07:08 vlevit

I wouldn't mind lending a hand in turning it into something that's POSIX-compliant. My copy of notify-send.sh already is, if I'm not mistaken. I don't think I've missed anything (I rely on shellcheck a lot, trusting it whole-heartedly that it can spot most of the quirks between POSIX sh and bash).

What's missing now is notify-action.sh, but that can be a bash script for the time being. I haven't properly looked at it, but I wanted to, with some time, look back at @bkw777's efforts as well. It might be possible to get the best of both worlds! Merge my efforts with his and get a POSIX compliant notify-send.sh with robust action background process handling.

rgcv avatar Aug 14 '20 07:08 rgcv

@rgcv Sounds like a plan. I can review PRs for POSIX compliance of the scripts and more robust handling of background processes (preferably separate but we can try to merge both at the same time too). I think for the latter it would really useful if we test the changes under different environments / notification daemons. I can test in gnome and if I have enough time in yet another distinct environment.

vlevit avatar Aug 14 '20 08:08 vlevit

@vlevit Awesome, that'd be neat. I can potentially test under Wayland with sway/mako and in i3/dunst.

I still have to look into the actions bit of the ordeal. For my part, I can just try and translate it to POSIX sh.

rgcv avatar Aug 14 '20 08:08 rgcv

Hi, I just would like to thank you, your script simplify life and wast time, i spend lot of hours to find solution of my require and congratulation of your job. Yes my bug report isn't a bug report:-D . i will include link of your script into my project. Again, thank you

surfzoid avatar Dec 13 '22 18:12 surfzoid