arikawa v3: Gateway reconnect panic

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x790e00]

goroutine 72401 [running]:
github.com/diamondburned/arikawa/v3/gateway.(*Gateway).ReconnectCtx(0xc000aea8c0, 0xb48740, 0xc0000281b0, 0xc00000e3c0, 0xc0002e5400)
        /root/go/pkg/mod/github.com/diamondburned/arikawa/[email protected]/gateway/gateway.go:339 +0xe0
github.com/diamondburned/arikawa/v3/gateway.(*Gateway).Reconnect(...)
        /root/go/pkg/mod/github.com/diamondburned/arikawa/[email protected]/gateway/gateway.go:309
github.com/diamondburned/arikawa/v3/gateway.(*Gateway).start.func1(0xb3e5c0, 0xc00029c380)
        /root/go/pkg/mod/github.com/diamondburned/arikawa/[email protected]/gateway/gateway.go:462 +0x18c
github.com/diamondburned/arikawa/v3/utils/wsutil.(*PacemakerLoop).StartBeating.func1(0xc000055ff0, 0xc000aea918)
        /root/go/pkg/mod/github.com/diamondburned/arikawa/[email protected]/utils/wsutil/heart.go:81 +0x48
created by github.com/diamondburned/arikawa/v3/utils/wsutil.(*PacemakerLoop).StartBeating
        /root/go/pkg/mod/github.com/diamondburned/arikawa/[email protected]/utils/wsutil/heart.go:81 +0x219

And several seconds before that:

2021-10-29T00:47:47.969+0800    ERROR   [MAIN] Gateway error: event returned error: WS error: websocket: close 1001 (going away): CloudFlare WebSocket proxy restarting
main.session.func1
        /usr/local/bin/song_librarian.bot/cli/songlibrarian/main.go:159
github.com/diamondburned/arikawa/v3/gateway.(*Gateway).start.func1
        /root/go/pkg/mod/github.com/diamondburned/arikawa/[email protected]/gateway/gateway.go:461
github.com/diamondburned/arikawa/v3/utils/wsutil.(*PacemakerLoop).StartBeating.func1
        /root/go/pkg/mod/github.com/diamondburned/arikawa/[email protected]/utils/wsutil/heart.go:81

Whenever a connection issue occurred, instead of tried to figure out how to reconnect, I made it to start a new state all over again, I'm wondering this could be the issue.

Oct 29 '21 03:10 No3371

Whenever a connection issue occurred, instead of tried to figure out how to reconnect, I made it to start a new state all over again, I'm wondering this could be the issue.

What's your code for this? The library should already be handling reconnection. The code might not be completely race-free if you call that in another goroutine.

FWIW, that is eventually going to be refactored to be completely thread-safe, but the user isn't really supposed to be doing that other than calling Close() anyway.

Oct 29 '21 03:10 diamondburned

Yes, for every "session", if state.ErrorLog is called, I close it. (defer s.Close())

This is because reconnection is not working well in my experience, if I see something like "going away" or "requested to reconnect", IIRC it hangs indefinitely, simply close it and create a new state solved it.

func session (sCloser chan struct{}) (err error) {
	var sessionSelfCloser chan struct{} = make(chan struct{})
	logger.Logger.Infof("[MAIN] Session is starting...")
	sv, err = storage.Sqlite()
	if err != nil {
		return errors.Wrap(err, "Failed to get storage")
	}

	binding.Setup(sv)

	s, err := state.New("Bot " + *globalFlags.token)
	if err != nil {
		return errors.Wrap(err, "Failed to get new bot state")
	}

	s.AddIntents(gateway.IntentDirectMessages)
	s.AddIntents(gateway.IntentGuildMessages)
	s.AddIntents(gateway.IntentGuildMessageReactions)

	pendingEmbeds = make(chan *pendingEmbed, 512)

	redirectorClosed := redirectorLoop(s, sessionSelfCloser)

	err = assureCommands(s)
	if err != nil {
		logger.Logger.Fatalf("[MAIN] %v", err)
	}

	addEventHandlers(s)
	addInteractionHandlers(s)

	s.ErrorLog = func(innerErr error) {
		logger.Logger.Errorf("[MAIN] Gateway error: %v", innerErr)
		err = innerErr
		select {
		case <-sessionSelfCloser:
		default:
			close(sessionSelfCloser)
		}
	}

	s.FatalErrorCallback = func(innerErr error) {
		logger.Logger.Errorf("[MAIN] Fatal gateway error: %v", err)
		err = innerErr
		select {
		case <-sessionSelfCloser:
		default:
			close(sessionSelfCloser)
		}
	}

	s.AfterClose = func(innerErr error) {
		logger.Logger.Errorf("[MAIN] After gateway closed: %v", err)
		err = innerErr
		select {
		case <-sessionSelfCloser:
		default:
			close(sessionSelfCloser)
		}
	}
	
	if err := s.Open(context.Background()); err != nil {
		logger.Logger.Errorf("[MAIN] %v", err)
		select {
		case <-sessionSelfCloser:
		default:
			close(sessionSelfCloser)
		}
	}
	defer s.Close()

	u, err := s.Me()
	if err != nil {
		log.Fatalln("Failed to get myself:", err)
	}
	logger.Logger.Infof("Session: %d", u.ID)
	
	s.UpdateStatus(gateway.UpdateStatusData{
		Since:      0,
		Activities: [] discord.Activity {
			{
				Name: locale.ACTIVITY,
				Type: discord.WatchingActivity,
			},
		},
		Status:     discord.OnlineStatus,
		AFK:        false,
	})

	logger.Logger.Infof("====== %s at your service ======", u.Username)

	promptClosed := startPromptLoop(s, sessionSelfCloser)

	select {
	case <-sCloser:
		close(sessionSelfCloser)
	case <-sessionSelfCloser:
	}
	s.ErrorLog = nil
	os.Stdin.WriteString("o")
	select {
	case <-promptClosed:
	case <-time.NewTimer(time.Second*5).C:
	}
	<-redirectorClosed
	err = sv.Close()
	if err != nil {
		return err
	}
	logger.Logger.Infof("[MAIN] Session is closed.")
	return nil
}

Oct 29 '21 03:10 No3371

This is because reconnection is not working well in my experience, if I see something like "going away" or "requested to reconnect", IIRC it hangs indefinitely, simply close it and create a new state solved it.

This is worthy of its own issue. I think it would be much better if you just open issues for every problem that the gateway has when reconnecting. Not only would it be less work and code to maintain for you, it would help other users a lot more as well.

I'll still leave this issue open and look into why it happens, but I strongly encourage you to open new issues on your reconnection issues as well.

Oct 29 '21 03:10 diamondburned

Well yeah, I was rushing to put together the bot for a private server, and disconnection casually happens ~10 times a day, so I didn't consider the whole connection issue is something not-awared-of; The session loop is also to handle errors in my own logic, it worked well, therefore I just kept it as-is so far.

Actually I decided to skip the re-connection very soon so it could be that I misunderstood it's not working, so I need to confirm the issue.

But not very soon I'm afraid, once I got spare time to test re-connection out and discover something, I'll post.

Oct 29 '21 04:10 No3371