jumpy icon indicating copy to clipboard operation
jumpy copied to clipboard

Improve Network Error Recovery

Open zicklag opened this issue 2 years ago • 10 comments

Right now the networking threads will panic when there are network errors. We should instead print a warning and continue as if the error didn't happen as much as possible.

zicklag avatar Apr 19 '23 17:04 zicklag

Can I try this?

ShadowWalker98 avatar May 03 '23 21:05 ShadowWalker98

Sure!

One piece of advice for async code is to use async blocks to group code that may error, propagate the errors in the block with the ? operator, and then handle the error once, outside the block.

For example:

let result = async move {
    let something = method_that_may_fail().await?;
    let another = method_that_may_fail2().await?;
    // More code, that could also fail maybe.

    Ok(())
};

if let Err(e) = result {
   warn!("Network error: {e}");
}

zicklag avatar May 04 '23 01:05 zicklag

Okay I'll give this a shot! Which places will I need to change the network code?

ShadowWalker98 avatar May 04 '23 05:05 ShadowWalker98

The notable places are the online matchmaker and the lan matchmaker, where we do a lot of .unwrap()-ing in the matchmaking loop.

The is to only unwrap on errors that represent bugs or mistakes that we, the programmers made somewhere. All of the network errors, which might happen shouldn't panic, because that will break the matchmaking thread, and then you'll have to restart the game before you can try to join another match.


Another thing we will need to do, now that I think about it, is that we need to send the error message to the UI by adding an Error(String) variant to the MatchmakerResponse structs.

That way, if we run into, for example, a timeout error, then the UI will show that the matchmaking didn't complete because of the error, as opposed to just looking like it's still searching for a match or something like that.

If you don't want to worry about the UI yet, I'm fine just merging it with just the changes in the matchmaking loops, and I can update the UI afterwards. It's up to you!

zicklag avatar May 04 '23 13:05 zicklag

Thanks for the info! I'll try this out! I am kinda new to Rust though if that's okay, I'll try my best.

ShadowWalker98 avatar May 04 '23 13:05 ShadowWalker98

No problem at all, just let me know if you have questions!

zicklag avatar May 04 '23 13:05 zicklag

Hello zicklag! I've made some changes to my fork but haven't committed or anything yet. I'm learning about error handling and seeing how better I can code it. How much time do I have for this story? I looked into error handling and everything and trying to see how it can be used along with your guiding comment. Rust is very new to me so is it ok if I take some time? Or do you need it now now?

ShadowWalker98 avatar May 07 '23 09:05 ShadowWalker98

I have learnt about the ? shorthand for error handling but in the code I see a lot of unwrapping. As per your comment, I should group them up into a code block and then warn the user. Is there any particular place where this warning is not advisable and is instead better to make the code break?

ShadowWalker98 avatar May 07 '23 09:05 ShadowWalker98

I've made some changes to my fork but haven't committed or anything yet.

Just a tip, don't be afraid to commit stuff often, even if it isn't ready yet. This is the best way to prevent losing work, and when we merge the pull request it always gets squashed into one commit anyway.

How much time do I have for this story? I looked into error handling and everything and trying to see how it can be used along with your guiding comment. Rust is very new to me so is it ok if I take some time? Or do you need it now now?

You can take your time and figure out how everything works, no rush!

Is there any particular place where this warning is not advisable and is instead better to make the code break?

The general rule is that the code should panic if it is a developer error or any other bug in the program.

For example, there are times where a function will return an error if we pass it a number that doesn't make sense for that function. If we know that we already checked the number is valid, then we'll unwrap() that result, because we know that we already checked.

But if the error might result from any kind of outside influence, such as the user or the network, etc. then the error should be handled gracefully and should warn.

For example, whenever we make a network request, it could fail based on conditions we can't control, and that's not a bug, so we should warn when we run into the error.

That way our program will only crash if it's something we, the developers did wrong. If there are no bugs in the program, then the program should never crash, no matter what the user, network, or filesystem does.

zicklag avatar May 07 '23 20:05 zicklag

I believe there is a crash that can be produced via timeout to matchmaking. I can verify repro if needed, but I think this will do it:

  • Set matchmaking server to invalid server
  • Try to matchmake in online
  • go back to settings, fix matchmaking server
  • try again

(At some point in this stage it will probably crash due to timeout). We also don't want to crash if matchmaking server is somehow unhealthy and timeout from it. Note that the code that handles this is likely in bones networking module, maybe should make an issue there instead but just noting for now.

MaxCWhitehead avatar Mar 23 '24 21:03 MaxCWhitehead