rollmint icon indicating copy to clipboard operation
rollmint copied to clipboard

Enable graceful shutdown of node instead of panicking

Open tzdybal opened this issue 1 year ago • 6 comments

We shouldn't have panics in production code. We should log the error and return the error.

What would be the risk of continuing to cycle in this loop of getting this error here?

Or we should have some way of gracefully shutting down to avoid data corruption from panics.

Originally posted by @MSevey in https://github.com/rollkit/rollkit/pull/1568#discussion_r1513022360

tzdybal avatar Mar 06 '24 21:03 tzdybal

Hi @tzdybal , can I work on this issue?

Teja2045 avatar Mar 08 '24 06:03 Teja2045

I am thinking of lifting the error to top level and handling it there

func (pb *PendingBlocks) getPendingBlocks(ctx context.Context) ([]*types.Block, error, error) {
	lastSubmitted := pb.lastSubmittedHeight.Load()
	height := pb.store.Height()

	if lastSubmitted == height {
		return nil, nil, nil
	}
	if lastSubmitted > height {
		return nil, nil, fmt.Errorf("height of last block submitted to DA (%d) is greater than height of last block (%d)",
		lastSubmitted, height)	
	}
	// rest of the code

and the caller would be something like this

blocks, err, haltingErr := m.pendingBlocks.getPendingBlocks(ctx)

if that looks a bit weird, I can try defining a custom error to differentiate between to them.

Or please do let me know if there is any other way, I will look into it @tzdybal .

Teja2045 avatar Mar 11 '24 05:03 Teja2045

Hey @Teja2045. Thanks for interest in the issue.

I think that we need a little bigger refactoring in codebase to cleanup the relations and life-cycle of various components.

Introducing second return error is non-idiomatic for Go. It will be super confusing. More idiomatic approach is just to introduce error type and check for it later using errors.Is.

tzdybal avatar Mar 11 '24 10:03 tzdybal

thanks for reply @tzdybal. that makes sense.

indeed, aside from error handling, a lot of cleanups need to be added to free the memory. If there is a bit more context on this issue, I would love to contribute, Thanks!

Teja2045 avatar Mar 11 '24 11:03 Teja2045

Needs investigating: If a panic occurs, how problematic would be to restart? Prioritize based on that. More problematic -> higher priority

Manav-Aggarwal avatar Jun 25 '24 18:06 Manav-Aggarwal

Related: https://github.com/rollkit/rollkit/issues/477

Manav-Aggarwal avatar Jun 25 '24 19:06 Manav-Aggarwal