flyctl `flaps`: retry on certain failure modes, such as 504s

Change Summary

What and Why: All flaps operations will retry if the error is known to be transient. This should improve reliability - especially in situations like fly deploy where one hiccup can (currently) stop an entire deploy.

How: Adds a wrapper in flapsutil over *flaps.Client that implements retrying on certain failure modes, like 504. NewClientWithOptions now returns the generic FlapsClient so that we can return the wrapper type instead of a raw *flaps.Client.

Documentation

[x] Fresh Produce
[ ] In superfly/docs, or asked for help from docs team
[ ] n/a

Jun 05 '24 22:06 alichay

Retrying POST/PATCH my yield unexpected results cause subsequent requests might to go to a different flaps and flyd among other issues. It's basically not Idempotent. https://flyio.discourse.team/t/flaps-what-status-codes-can-we-retry/5060/2

Jun 06 '24 11:06 rugwirobaker

Retrying POST/PATCH my yield unexpected results cause subsequent requests might to go to a different flaps and flyd among other issues. It's basically not Idempotent. https://flyio.discourse.team/t/flaps-what-status-codes-can-we-retry/5060/2

ah. that's a huge bummer. I'm going to try to see if I can get this working for GET requests then, and maybe afterwards come up with a different strategy for fly deploy machine creation.

I wonder if some operations, like SetMetadata, could be retried anyway? it's setting a named value, so even if it were called multiple times there shouldn't be any side effects...

Jun 06 '24 21:06 alichay