sopel icon indicating copy to clipboard operation
sopel copied to clipboard

Bot does not rejoin channels reliably after a netsplit

Open kwaaak opened this issue 6 years ago • 8 comments

It works most of the time but often enough it doesn't. Maybe this is the responsibility of the network? In any case, someone more in tune with the IRC protocol might have an elegant way of making sure the bot stays in the channel(s).

kwaaak avatar May 27 '18 13:05 kwaaak

Even "someone more in tune with the IRC protocol" can't do anything about this without raw logs of what's happening between Sopel and the IRC server.

dgw avatar May 27 '18 13:05 dgw

The bot recognises that something is wrong and initiates a reconnect

>>1528894738.2752254	PING irc.efnet.nl
<<1528894738.2765565	:irc.efnet.nl PONG irc.efnet.nl :irc.efnet.nl
>>1528894858.3953335	PING irc.efnet.nl
>>1528894918.4552155	PING irc.efnet.nl
>>1528895021.7282913	CAP LS 302
>>1528895021.7284808	NICK botnickname
[server connection stuff]
<<1528895025.5272648	:irc.efnet.nl 001 botnickname :Welcome to the EFNet Internet Relay Chat Network botnickname
>>1528895025.527987	MODE  botnickname +B
>>1528895025.528177	JOIN #channelname
[server connection stuff, MOTD]
<<1528895025.5832474	:irc.efnet.nl 437 botnickname #channelname :Nick/channel is temporarily unavailable
[...]
>>1528896821.9092314	PRIVMSG #channelname :example message
<<1528896821.9106731	:irc.efnet.nl 404 botnickname #channelname :Cannot send to channel

The expected message after the JOIN would be:

<<1528925459.723694	:[email protected] JOIN :#channelname

Due to the state of the network, joining a channel is not possible at the time of the connection. Should the bot retry periodically to join the channels?

kwaaak avatar Jun 13 '18 21:06 kwaaak

I think EFNet is one of very few networks that lock channels during a netsplit. Most IRCds (AFAIK) just let users on the lost segment join channels anyway and resolve ops collisions with timestamps and/or services.

Adding this sort of logic to core doesn't seem especially worthwhile. For the majority of users, it would just waste CPU time. Once restarting (#1333) is done, a plugin could probably do it though. Or, run Sopel behind a bouncer (ZNC?) and let the bouncer handle channel joining and retries for free.

dgw avatar Jun 14 '18 00:06 dgw

Handling numeric 437 (ERR_UNAVAILRESOURCE) shouldn't be too difficult, as it does include the nick/channel that was unavailable (so there's no need for Sopel to do a lot of complicated state tracking).

I don't think there are any situations where Sopel would receive a 437 for something that isn't a channel, but this feature would definitely need someone to commit to testing it on a network that handles netsplits this way for some time before release. I'm not in a position to do so, realistically.

dgw avatar Nov 13 '18 09:11 dgw

Punting relatively minor enhancement with an existing workaround.

dgw avatar Nov 16 '19 17:11 dgw

I suggest to punt even further, to Sopel 8.x.

Exirel avatar Oct 02 '20 12:10 Exirel

I suggest to punt even further, to Sopel 8.x.

Belatedly, I agree.

dgw avatar Feb 25 '21 01:02 dgw

Let's consider this part of the asyncio rewrite's shakedown, to be revisited when work starts on 8.1.

dgw avatar Jul 14 '22 04:07 dgw