cdap
cdap copied to clipboard
CDAP-19465 Refactoring the Kill() method in RemoteExecutionTwillController.complete()
CDAP 19465 In tethered mode, RemoteExecutionTwillController issues kill despite successful pipeline run
Currently, the RemoteExecutionTwillController executes a kill() even on pipeline complete. In DataProc mode this works fine as the kill() is used to signal to DataProc to clean up resources. But this behavior is not required in Tethered mode.
Solution: Adding kill(RuntimeJobDetail) method to TetheringRuntimeJobManager to handle the case where the Program is completed and therefore a terminate should not be issued.
Testing - Deployed the CDAP image on K8S and ran a pipeline in tethered mode. Logs on pipeline completion, when testing the pipeline in Tethered mode -
2022-09-22 18:08:20,532 - DEBUG [runtime-scheduler-10:i.c.c.c.s.AbstractRetryableScheduledService@109] - Stopping scheduled service runtime-service-16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f
2022-09-22 18:08:20,533 - INFO [runtime-scheduler-10:i.c.c.i.a.r.d.AbstractTwillProgramController@75] - Twill program terminated: program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f, twill runId: 16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f, status: SUCCEEDED
2022-09-22 18:08:20,534 - DEBUG [pcontroller-program:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow-16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f-2:i.c.c.a.r.AbstractProgramRuntimeService@334] - RuntimeInfo removed: program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f
2022-09-22 18:08:20,539 - DEBUG [program.status:i.c.c.i.a.r.d.r.RemoteExecutionTwillController@138] - Force termination of remote process for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f with status UNKNOWN
2022-09-22 18:08:20,539 - DEBUG [program.status:i.c.c.i.a.r.d.r.RuntimeJobRemoteProcessController@90] - Force stopping program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f based on program status
2022-09-22 18:08:20,563 - DEBUG [program.status:i.c.c.i.t.r.s.r.TetheringRuntimeJobManager@164] - No need to kill Program run ProgramRunInfo{namespace='default', application='tethering-pipeline', version='-SNAPSHOT', programType='WORKFLOW', program='DataPipelineWorkflow', run='16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f'} with configuration: tethered instance name hdf-itn-instance, namespace default, status UNKNOWN
2022-09-22 18:08:20,597 - DEBUG [program.status:i.c.c.i.p.t.ProvisioningTask@86] - Created DEPROVISION task for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.
2022-09-22 18:08:20,613 - DEBUG [provisioning-task-5:i.c.c.i.p.t.ProvisioningTask@125] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.
2022-09-22 18:08:20,613 - DEBUG [provisioning-task-5:i.c.c.i.p.t.ProvisioningTask@129] - Completed DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.
2022-09-22 18:08:20,762 - DEBUG [provisioning-task-5:i.c.c.i.p.t.ProvisioningTask@116] - Completed DEPROVISION task for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.
Logs on pipeline completion in non-tethered mode (using Dataproc compute profile)-
2022-09-22 18:17:11,082 - DEBUG [runtime-scheduler-9:i.c.c.c.s.AbstractRetryableScheduledService@109] - Stopping scheduled service runtime-service-1754cda5-3aa2-11ed-9944-9edf9266a03f
2022-09-22 18:17:11,083 - INFO [runtime-scheduler-9:i.c.c.i.a.r.d.AbstractTwillProgramController@75] - Twill program terminated: program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f, twill runId: 1754cda5-3aa2-11ed-9944-9edf9266a03f, status: SUCCEEDED
2022-09-22 18:17:11,084 - DEBUG [pcontroller-program:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow-1754cda5-3aa2-11ed-9944-9edf9266a03f-1:i.c.c.a.r.AbstractProgramRuntimeService@334] - RuntimeInfo removed: program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f
2022-09-22 18:17:13,442 - DEBUG [program.status:i.c.c.i.a.r.d.r.RemoteExecutionTwillController@138] - Force termination of remote process for program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f with status COMPLETED
2022-09-22 18:17:13,442 - DEBUG [program.status:i.c.c.i.a.r.d.r.RuntimeJobRemoteProcessController@90] - Force stopping program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f based on program status
2022-09-22 18:17:13,501 - DEBUG [program.status:i.c.c.i.p.t.ProvisioningTask@86] - Created DEPROVISION task for program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f.
2022-09-22 18:17:13,514 - DEBUG [provisioning-task-3:i.c.c.i.p.t.ProvisioningTask@125] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f.