lita-hipchat icon indicating copy to clipboard operation
lita-hipchat copied to clipboard

Auto reconnect capability

Open shusugmt opened this issue 10 years ago • 13 comments

It would be nice if it reconnects automatically when connection to HipChat has lost.

I run my bot on my laptop. Once I close my PC, lita never reconnects automatically so I need to stop and launch it again every time, which is a bit annoying.

shusugmt avatar Jan 05 '15 01:01 shusugmt

I've written a poc. @jimmycuadra before I update specs, can you see my implementation? s2ugimot/lita-hipchat@0d9369d1cc884384498279b9eafe910b6929324f

shusugmt avatar Jan 05 '15 01:01 shusugmt

Thanks for the POC. I think trying to manage reconnection within the client like this is probably a little too complex. What do you think about making Lita crash (or just calling shut_down explicitly) when the connection is lost? Then it can easily be restarted by the user's preferred process manager/init system. I like the idea of just letting it crash and then starting up again with a totally clean slate.

jimmycuadra avatar Jan 08 '15 11:01 jimmycuadra

@jimmycuadra Thank you. Your idea sounds better. I'll write a new poc on other branch.

shusugmt avatar Jan 09 '15 00:01 shusugmt

Just an opinion, but it feels to me like you're hitting a nail with a sledgehammer by crashing lita. What if lita is using more than 1 adapter, like slack and hipchat? In this case, if hipchat goes down for more than a few seconds, lita will be in a join/leave cycle in slack until hipchat comes back up.

Also, many process managers will only attempt to restart a service a finite number of times before it gives up. If hipchat were down for long enough, it wouldn't ever be restarted if the process manager gave up.

justintime avatar Apr 30 '15 16:04 justintime

A single Lita process can't be used with more than one adapter, so that is not a problem in this case. I'm not aware of a process manager that can't keep attempting to restart a process infinitely – do you have examples? What would you suggest rather than letting the process exit?

jimmycuadra avatar Apr 30 '15 19:04 jimmycuadra

While I'm not new to relying on external services from long running processes, I'm really new to lita. Thanks for pointing out the single adapter thing.

systemd will often use restart=on-failure, which if lita exits with status 0, it wouldn't restart the service. In upstart, there's respawn-limit.

As to how to deal with it, I'd think that applying graceful degradation similar to the way @s2ugimot did in s2ugimot/lita-hipchat@0d9369d would require less work from the sysadmin and would ultimately be a more "set-it-and-forget-it" approach.

Hubot does it similarly: https://github.com/hipchat/hubot-hipchat/pull/174

BTW: didn't get to thank you for your ChatOps panel at DevOpsDaysRox. I was the one in the openspace that talked about the "clock in" feature that was the killer feature that got buy-in from the users. I've been working on replacing Hubot with Lita this week and have been really happy so far.

justintime avatar Apr 30 '15 21:04 justintime

Interesting... thanks for the additional details. I'm not sure how I feel about where/how this issue should be handled. I'll keep thinking about it.

Glad you've been enjoying Lita so far! Getting feedback about it that can lead to more improvements is the best thanks I could ask for. :}

jimmycuadra avatar Apr 30 '15 21:04 jimmycuadra

Another point to consider - a process supervisor is not a given. sysvinit is still widely used.

I also imagine how I might feel if a Java or Rails app I manage simply exited because a 3rd party service went away for a moment and it didn't try to reconnect.

Thanks for continuing to think about this.

dblessing avatar May 01 '15 04:05 dblessing

@jimmycuadra I'm not a huge fan of this approach. If the websocket is closed, its an error? We should handle errors right?

Would you be adverse to some retry logic that is configurable, perhaps an exponential backoff?

willejs avatar May 28 '15 11:05 willejs

lita-hipchat doesn't use WebSockets, but in any case, what type of error are you referring to? If there's something specific that is recoverable, then probably yes. I'm still hesitant about adding reconnection logic inside the adapter. I'd love to hear more specifics on how letting a process manager restart it has been a problem for folks.

jimmycuadra avatar May 28 '15 11:05 jimmycuadra

@jimmycuadra Do you have any examples of other applications that take this approach? I'm not sure you'll get many specifics on how it's worked for people because I'm not sure it's a common thing to do. Currently we use CentOS 6 and sysvinit so there is no process supervisor. We don't have any processes that routinely bomb out when external services fail and I would call it a bug if they did.

dblessing avatar May 28 '15 14:05 dblessing

I guess I'm amenable to this feature after more thought (thank you for all the comments). I don't love the idea of having to maintain it, but clearly there is demand and the goal is to make users' lives easier. I'm really just kind of gun shy about handling additional complexity in the adapter since XMPP and the xmpp4r library are already quite nasty, and I don't want to introduce things that might create more subtle bugs in the future. But if someone wants to take this on, I will accept a really solid PR. I think it will need to handle reconnection with an exponential backoff and a maximum number of attempts to be safe.

jimmycuadra avatar Jun 18 '15 08:06 jimmycuadra

We're running lita in kubernetes with built in process monitoring. We have other processes running under God and even custom scripts. While I appreciate the views about it reconnecting. I also like simplicity. I think process monitoring isn't that unusual and would suggest running lita under something even if it did handle this reconnect issue. Afterall, this isn't only reason lita might fail and process monitors have a rich feature set for dealing with these events. So practically, if making lita crash is an easy development task, I'd welcome it as an acceptable solution that moves lita forward. When/if a pull request comes with reconnect feature, I'll heed Jimmy's caution and let it bake a while before upgrading. Till then, it's manual intervention every time Hipchat's server's have a blip. Thanks for everyone who's contributed to lita.

sjernigan avatar Jan 13 '16 02:01 sjernigan