flyctl icon indicating copy to clipboard operation
flyctl copied to clipboard

`flaps`: retry on certain failure modes, such as 504s

Open alichay opened this issue 1 year ago • 2 comments

Change Summary

What and Why: All flaps operations will retry if the error is known to be transient. This should improve reliability - especially in situations like fly deploy where one hiccup can (currently) stop an entire deploy.

How: Adds a wrapper in flapsutil over *flaps.Client that implements retrying on certain failure modes, like 504. NewClientWithOptions now returns the generic FlapsClient so that we can return the wrapper type instead of a raw *flaps.Client.


Documentation

  • [x] Fresh Produce
  • [ ] In superfly/docs, or asked for help from docs team
  • [ ] n/a

alichay avatar Jun 05 '24 22:06 alichay

Retrying POST/PATCH my yield unexpected results cause subsequent requests might to go to a different flaps and flyd among other issues. It's basically not Idempotent. https://flyio.discourse.team/t/flaps-what-status-codes-can-we-retry/5060/2

rugwirobaker avatar Jun 06 '24 11:06 rugwirobaker

Retrying POST/PATCH my yield unexpected results cause subsequent requests might to go to a different flaps and flyd among other issues. It's basically not Idempotent. https://flyio.discourse.team/t/flaps-what-status-codes-can-we-retry/5060/2

ah. that's a huge bummer. I'm going to try to see if I can get this working for GET requests then, and maybe afterwards come up with a different strategy for fly deploy machine creation.

I wonder if some operations, like SetMetadata, could be retried anyway? it's setting a named value, so even if it were called multiple times there shouldn't be any side effects...

alichay avatar Jun 06 '24 21:06 alichay