airflow Add fail_on_nonzero_exit parameter to SSM operators for exit code routing

Problem

SSM operators currently fail when commands return non-zero exit codes, making it impossible to:

Route workflows based on different exit codes
Handle commands where non-zero exit codes represent valid business states (e.g., partial success, warnings)
Implement conditional retry logic based on specific exit codes
Migrate from traditional schedulers like Autosys that support exit code routing

Users have been forced to implement manual polling workarounds with custom Python tasks to handle these scenarios.

Proposal

Add a fail_on_nonzero_exit parameter (default: True) to SsmRunCommandOperator, SsmRunCommandCompletedSensor, and SsmRunCommandTrigger.

When set to False:

Tasks complete successfully regardless of command exit codes
Exit codes can be retrieved with SsmGetCommandInvocationOperator for routing decisions
AWS-level failures (TimedOut, Cancelled) still raise exceptions
Command-level failures (non-zero exit codes) are tolerated

The default value of True maintains existing behavior for backward compatibility.

Nov 03 '25 15:11 ksharlandjiev

My general comments:

* I think the idea is great, that is indeed a feature user might want. Thanks for creating a PR for that

* Creating documentation is great, however, I think the document is way too long. This is only my personal opinion so I would wait to see what others think but if for new parameter we are creating documentation that big, this will be impossible to maintain. Using AI to create code and/or documentation is great but we should also keep in mind, the longer is NOT the better. Again, I support the documentation, but this is way too big to me. As a user I wont probably read it all, and as a developer I am scared we need to maintain that

* The system test is a great idea, could you please move these 3 examples in the current system test?

Thanks for your feedback. I was on the fence myself on the extra docs, and I understand the concern. I'm happy to move all documented patterns to an external article.

Nov 03 '25 22:11 ksharlandjiev

Have you run the system test to ensue that it's working correctly?

Thanks for the approval! I’ve added a few additional tests to the system test to cover this change, following @vincbeck’s feedback, and I can confirm that everything executes successfully.

Dec 10 '25 23:12 ksharlandjiev