Fix(CDAP-21219): Handle CancelJob on DONE Dataproc jobs gracefully
This commit addresses an issue where CDAP pipelines were incorrectly marked as FAILED when ephemeral Dataproc cluster deprovisioning attempted to cancel a job that had already completed.
The following changes are included:
-
RemoteExecutionTwillController: Added a RuntimeJobStatus check before attempting to force kill a remote process in the
complete()method's exception handler. This prevents sending a kill command to jobs already in a terminal state. -
AbstractDataprocProvisioner: Modified
deleteClusterWithStatusto specifically detect and handle the error returned by the Dataproc API when a CancelJob request is made on a job in the DONE state. This error is now logged as a warning and does not cause the pipeline to be marked as FAILED. -
Unit Tests: Added new unit tests for both
RemoteExecutionTwillControllerandDataprocProvisionerto verify the new logic and prevent regressions. -
CONTRIBUTING.rst: Updated the issues link to the current JIRA URL.
These changes ensure that the pipeline status accurately reflects the execution result even if there are timing issues during cluster deprovisioning.
Fixes: b/460875216