hubot-stackstorm icon indicating copy to clipboard operation
hubot-stackstorm copied to clipboard

Hubot doesn't reconnect to the stream on API restart

Open emedvedev opened this issue 9 years ago • 14 comments

hm, so after I restart with st2ctl restart, my bot stops receiving events from st2 :simple_smile: until I restart the bot too

A community report. I've run into that issue a couple times, too. Not sure if we should solve in st2client or here.

emedvedev avatar Dec 09 '15 02:12 emedvedev

Yes, I reported that the other day. Let me know if you need any info from my end if you can't repro it.

emptywee avatar Dec 09 '15 12:12 emptywee

@emptywee thanks! I can, although not always. This is probably an eventstream module issue: by default a client should attempt reconnects indefinitely, and it looks like the module we use doesn't always do that. Even if it's the case, I think we'll be able to fix it or implement an additional layer of checks on our level.

emedvedev avatar Dec 09 '15 15:12 emedvedev

Hi there. I'm running a new installation to trial StackStorm and I think this issue biting us fairly hard. Anytime the our Hubot loses connection to the StackStorm API it doesn't attempt reconnect and is then running lame. No more StackStorm goodness. I have to manually restart Hubot :(

Is this on the priority to fix? The functionality I've been able to implement quickly is great! But the reliability here is a big deal.

We're running Hubot independently. I installed this script into our previously existing bot.

Thanks for the help!

ticean avatar Mar 02 '16 21:03 ticean

Thanks for the report! we will be looking into it, your +1 increments the priority but no committed fix yet.

dzimine avatar Mar 02 '16 21:03 dzimine

@enykeev was looking into it some time ago, but it would require a fairly difficult module rewrite. @enykeev: sup?

emedvedev avatar Mar 03 '16 04:03 emedvedev

@ticean: did you install with packages or AIO installer? We still have this issue on packages, but AIO should be good.

In short, this error is caused by the stream consumer module not recognizing error 5xx as a reason for reconnect. In AIO we apply a special fix to give stream errors special treatment: https://github.com/StackStorm/st2workroom/pull/303/files

If you chose packages as your install method, right now you can apply it manually, and in the future we'll hopefully have a better fix.

emedvedev avatar Mar 03 '16 04:03 emedvedev

I installed with AIO installer, but I configured this hubot-stackstorm plugin into a pre-existing, non-AIO Hubot installation.

As an underlying problem, I'm finding that the stackstorm nginx instance is stopping (or going zombie) every night. I haven't been able to find out what's scheduled to cause that, but it definitely seems like periodic task. The host is dedicated to stackstorm with AIO. If I could find and fix this, then it would definitely lower the urgency of the issue.

For now, though, I find our bot disconnected each morning and have to restart nginx and then the bot.

ticean avatar Mar 03 '16 20:03 ticean

I should also note that I've customized the HTTPS certs using letsencrypt. I modified the paths to the certs in /etc/st2/st2.conf. At first, I installed a cron to renew the letsencrypt cert. I though this might have caused the issue of nginx stopping. But nginx zombies - even after the letsencrypt task is removed.

I mention this because I haven't used puppet. Maybe there's a convergence scheduled nightly? Any recommendations?

ticean avatar Mar 03 '16 20:03 ticean

I ran into this failure to reconnect too just changed over to the new packages and running st2chatops on the same server as the rest of stackstorm. In the short term could be may be make st2ctl also restart st2chatops too?

jjm avatar Mar 23 '16 14:03 jjm

As discussed on slack yesterday. I ran into this (again), but this time seems to have been caused by the st2stream process having a traceback during log rotation.

:+1:

jjm avatar May 31 '16 07:05 jjm

@armab I just stumbled across reconnecting-eventsource. It looks like it has most of the logic we would need to implement to have hubot consistently reconnect to st2stream. What do you think of using that instead of the built-in EventSource in stackstorm.js?

blag avatar Jun 06 '19 08:06 blag

@blag Seems like eventsource is used on st2client side. I would try to reproduce and debug the issue itself first (restart st2 services + debug st2chatops) , trying to understand what's going on behind the hood.

For example, https://github.com/fanout/reconnecting-eventsource#when-does-the-normal-eventsource-not-reconnect advertises to reconnect on 5XX errors, while in our nginx.conf for st2stream we exclusively added a hack to not return such errors, workarounding described eventsource limitation.

But if you'll catch the root cause, understand what happens at a deeper level (is it missing closed connections in original eventsource or was it specific HTTP code or anything else), - that would be great. I think it's all doable, just a matter of dedication and time spent on troubleshooting. If finding that reconnecting-eventsource or any other fix would solve the root cause, - that's :100:

arm4b avatar Jun 06 '19 12:06 arm4b

@armab @blag Installed system st2 3.0.0, on Python 2.7.12. With commands

root@ewc:/opt/stackstorm/chatops# service st2api stop
root@ewc:/opt/stackstorm/chatops# service st2api start

root@ewc:/opt/stackstorm/chatops# service st2stream stop
root@ewc:/opt/stackstorm/chatops# service st2stream start

root@ewc:/opt/stackstorm/chatops# service nginx stop
root@ewc:/opt/stackstorm/chatops# service nginx start

Or stop all above services at once then start, st2chatops is reconnected without issue. Will do more investigation on this issue.

jinpingh avatar Jun 10 '19 18:06 jinpingh

@jinpingh Take a look at one edge case example of this: https://github.com/StackStorm/hubot-stackstorm/issues/157#issuecomment-504168117

arm4b avatar Jun 25 '19 23:06 arm4b