Treat TimeoutError as workflow/update failure instead of task failure
What was changed
TimeoutError is now an workflow/update failure instead of task failure. This makes sense since task failures are only meant to be code problems that are fixable with a code redeployment. We do not consider it a backwards incompatible change to make something no longer be a task failure.
Checklist
- Closes #798
is this consistent with other SDKs?
Yes. In all non-Python and non-Ruby SDKs, wait condition timeout is not an error it's a boolean. In Ruby and Python we use language-native timeout features and in Ruby timeouts are workflow failures, we just need to do it for Python. Today, Python is the only SDK where a timeout of a wait condition causes task failure, which is bad.
it seems a bit severe to fail the workflow due to a timeout error but is the rationale, "what else can we do, there's nothing to fix"? I suppose this will give incentive for users to retry workflows but I was always unsure if workflow retries were something that we wanted to be common
We don't consider it severe to fail the workflow for runtime errors, but really users should be catching this and reacting probably anyways. In cases where timeout is a failure, it is a runtime failure not a code/task failure.