ement.el icon indicating copy to clipboard operation
ement.el copied to clipboard

Add: global minor mode `ement-sync-watchdog-mode'

Open phil-s opened this issue 1 year ago • 8 comments

Global minor mode ement-sync-watchdog-mode periodically re-syncs sessions in case of ement-auto-sync failures. Customizing ement-auto-sync automatically enables/disables the new mode.

Option ement-sync-watchdog-interval determines how frequently the global mode acts (by default, every 5 minutes).

Option ement-sync-watchdog-sessions allows you to constrain which sessions are targeted (by default, all sessions).

Closes #177.

phil-s avatar Aug 27 '24 11:08 phil-s

Feature was tested by connecting to ement -> writing a message -> disconnect from the internet and receive any form of messages while remaining disconnected from the internet -> re-connect to the internet and it auto syncs after the defined time interval.

Very cool feature and thanks for implementing + pushing the change to upstream! LGTM!

Icy-Thought avatar Aug 27 '24 14:08 Icy-Thought

Thanks, Phil.

I'd like to consider simplifying this feature. For example, I think it would be nice if there were just an option that users could enable, like ement-reconnect-p, or something like that, which, when enabled, would automatically run a timer to re-sync a session when a sync times out. That way it wouldn't require a minor mode, nor a timer that runs at all times.

I realize that that would mean discarding most of this code you've written, and that the code you've written works as-is.

Anyway, what do you think?

alphapapa avatar Aug 27 '24 15:08 alphapapa

If you can see and address specific failure cases like timeouts, I'm all for it.

The thing I rather like about my approach is that the reason for the sync failing is irrelevant, and therefore it ought to be pretty robust; whereas "re-sync a session when a sync times out" comes with some assumptions about what it was that went wrong.

I like the idea that even if the failure was due to some edge-case lisp error, something will come along and kick things off again soon enough.

phil-s avatar Aug 27 '24 17:08 phil-s

I guess it might be good if the timer-based re-sync notices when everything is fine and does nothing; but conversely, if it can tell that it needs to take action, it logs that fact to a file. That might then provide some better evidence as to whether or not there's any need for the timer.

phil-s avatar Aug 27 '24 17:08 phil-s

ement-sync-watchdog-mode also has potential use as an alternative to ement-auto-sync -- a user could leave the latter option off, and let the minor mode re-sync at their preferred interval, if they felt that real-time-ish updates were a bit noisy/distracting, but didn't want it to always have to do it manually.

phil-s avatar Aug 27 '24 17:08 phil-s

I like the idea that even if the failure was due to some edge-case lisp error, something will come along and kick things off again soon enough.

I understand the appeal of that, but IME that tends to be a bad idea, especially with Matrix's idempotent token-based syncing: if a sync's events trigger an error in Lisp code, retrying the same sync from the same token will usually trigger the same error again next time. So rather than retrying that sync, potentially infinitely, it's better to stop syncing and show the error so the user can report it and get the bug fixed. (Note that I haven't seen any errors like that in a long time, so they're generally unlikely; at the same time, with Matrix's gradual evolution, it's probably inevitable that such an error will eventually happen again.)

For the simple cases, like the network being disconnected for a while, fixing that would be as simple as running a timer to retry syncing here: https://github.com/alphapapa/ement.el/blob/3df237176262ecf6def8f0766154fea066009a97/ement.el#L548-L552

alphapapa avatar Aug 27 '24 17:08 alphapapa

That sounds like the next step, then.

How about we keep this PR on the back-burner, and I can follow up down the track if I think it's still helping.

Is there currently a way to check when the last sync request for a session was made? (I suspect the session has a server-generated timestamp, but it's the Emacs clock value I'd want). I think it would be handy if we tracked that as well, and then anything that's interested can check how long it's been since the last one. In fact both a "last sync request time" and "last successful sync completion time" per session would be good, I think.

I see a let-bound sync-start-time (time-to-seconds) in ement--sync but at present it's just used for message purposes.

phil-s avatar Aug 28 '24 01:08 phil-s

That sounds like the next step, then.

How about we keep this PR on the back-burner, and I can follow up down the track if I think it's still helping.

Sure. I'll set it as a draft for now.

Is there currently a way to check when the last sync request for a session was made? (I suspect the session has a server-generated timestamp, but it's the Emacs clock value I'd want). I think it would be handy if we tracked that as well, and then anything that's interested can check how long it's been since the last one. In fact both a "last sync request time" and "last successful sync completion time" per session would be good, I think.

I see a let-bound sync-start-time (time-to-seconds) in ement--sync but at present it's just used for message purposes.

No, we rely on the curl process to timeout the request accordingly. AFAIK that is completely reliable. I've never seen curl fail to either finish receiving a request or give a timeout error. That's why the retry logic is in the code I linked, which is triggered by curl's returning an error. If that were not the case, plz itself would have to run internal watchdog timers to timeout curl requests from inside Emacs.

alphapapa avatar Aug 28 '24 16:08 alphapapa