hail icon indicating copy to clipboard operation
hail copied to clipboard

[hailtop] Dont assume exact error message match for ClientPayloadError retrying

Open daniel-goldstein opened this issue 1 month ago • 0 comments

The treatment of ClientPayloadError as a sometimes transient error was originally made in response to an existing issue in aiohttp that can cause transient errors on the client that are difficult to distinguish from a real broken server. What's in main matched exactly on the error message, but that error message has since changed to include more information, breaking our transient error handling. This change relaxes the requirement of the error response string to fix transient error handling for our current version of aiohttp.

I wish I had a better approach. ClientPayloadError can also be thrown in the case of malformed data, so I am reticent to treat it as always transient, but we could perhaps make it a limited_retries_error and avoid inspecting the error message.

daniel-goldstein avatar May 10 '24 14:05 daniel-goldstein