cdap icon indicating copy to clipboard operation
cdap copied to clipboard

Fix(CDAP-21219): Handle CancelJob on DONE Dataproc jobs gracefully

Open cjac opened this issue 4 months ago • 0 comments

This commit addresses an issue where CDAP pipelines were incorrectly marked as FAILED when ephemeral Dataproc cluster deprovisioning attempted to cancel a job that had already completed.

The following changes are included:

  1. RemoteExecutionTwillController: Added a RuntimeJobStatus check before attempting to force kill a remote process in the complete() method's exception handler. This prevents sending a kill command to jobs already in a terminal state.

  2. AbstractDataprocProvisioner: Modified deleteClusterWithStatus to specifically detect and handle the error returned by the Dataproc API when a CancelJob request is made on a job in the DONE state. This error is now logged as a warning and does not cause the pipeline to be marked as FAILED.

  3. Unit Tests: Added new unit tests for both RemoteExecutionTwillController and DataprocProvisioner to verify the new logic and prevent regressions.

  4. CONTRIBUTING.rst: Updated the issues link to the current JIRA URL.

These changes ensure that the pipeline status accurately reflects the execution result even if there are timing issues during cluster deprovisioning.

Fixes: b/460875216

cjac avatar Nov 15 '25 00:11 cjac