cdap icon indicating copy to clipboard operation
cdap copied to clipboard

CDAP-19465 Refactoring the Kill() method in RemoteExecutionTwillController.complete()

Open ameya0111 opened this issue 2 years ago • 1 comments

CDAP 19465 In tethered mode, RemoteExecutionTwillController issues kill despite successful pipeline run

Currently, the RemoteExecutionTwillController executes a kill() even on pipeline complete. In DataProc mode this works fine as the kill() is used to signal to DataProc to clean up resources. But this behavior is not required in Tethered mode.

Solution: Adding kill(RuntimeJobDetail) method to TetheringRuntimeJobManager to handle the case where the Program is completed and therefore a terminate should not be issued.

Testing - Deployed the CDAP image on K8S and ran a pipeline in tethered mode. Logs on pipeline completion, when testing the pipeline in Tethered mode -


2022-09-22 18:08:20,532 - DEBUG [runtime-scheduler-10:i.c.c.c.s.AbstractRetryableScheduledService@109] - Stopping scheduled service runtime-service-16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f
2022-09-22 18:08:20,533 - INFO  [runtime-scheduler-10:i.c.c.i.a.r.d.AbstractTwillProgramController@75] - Twill program terminated: program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f, twill runId: 16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f, status: SUCCEEDED
2022-09-22 18:08:20,534 - DEBUG [pcontroller-program:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow-16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f-2:i.c.c.a.r.AbstractProgramRuntimeService@334] - RuntimeInfo removed: program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f
2022-09-22 18:08:20,539 - DEBUG [program.status:i.c.c.i.a.r.d.r.RemoteExecutionTwillController@138] - Force termination of remote process for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f with status UNKNOWN
2022-09-22 18:08:20,539 - DEBUG [program.status:i.c.c.i.a.r.d.r.RuntimeJobRemoteProcessController@90] - Force stopping program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f based on program status
2022-09-22 18:08:20,563 - DEBUG [program.status:i.c.c.i.t.r.s.r.TetheringRuntimeJobManager@164] - No need to kill Program run ProgramRunInfo{namespace='default', application='tethering-pipeline', version='-SNAPSHOT', programType='WORKFLOW', program='DataPipelineWorkflow', run='16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f'} with configuration: tethered instance name hdf-itn-instance, namespace default, status UNKNOWN
2022-09-22 18:08:20,597 - DEBUG [program.status:i.c.c.i.p.t.ProvisioningTask@86] - Created DEPROVISION task for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.
2022-09-22 18:08:20,613 - DEBUG [provisioning-task-5:i.c.c.i.p.t.ProvisioningTask@125] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.
2022-09-22 18:08:20,613 - DEBUG [provisioning-task-5:i.c.c.i.p.t.ProvisioningTask@129] - Completed DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.
2022-09-22 18:08:20,762 - DEBUG [provisioning-task-5:i.c.c.i.p.t.ProvisioningTask@116] - Completed DEPROVISION task for program run program_run:default.tethering-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.16a1c6c4-3aa1-11ed-b7bb-9edf9266a03f.


Logs on pipeline completion in non-tethered mode (using Dataproc compute profile)-

2022-09-22 18:17:11,082 - DEBUG [runtime-scheduler-9:i.c.c.c.s.AbstractRetryableScheduledService@109] - Stopping scheduled service runtime-service-1754cda5-3aa2-11ed-9944-9edf9266a03f
2022-09-22 18:17:11,083 - INFO  [runtime-scheduler-9:i.c.c.i.a.r.d.AbstractTwillProgramController@75] - Twill program terminated: program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f, twill runId: 1754cda5-3aa2-11ed-9944-9edf9266a03f, status: SUCCEEDED
2022-09-22 18:17:11,084 - DEBUG [pcontroller-program:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow-1754cda5-3aa2-11ed-9944-9edf9266a03f-1:i.c.c.a.r.AbstractProgramRuntimeService@334] - RuntimeInfo removed: program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f
2022-09-22 18:17:13,442 - DEBUG [program.status:i.c.c.i.a.r.d.r.RemoteExecutionTwillController@138] - Force termination of remote process for program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f with status COMPLETED
2022-09-22 18:17:13,442 - DEBUG [program.status:i.c.c.i.a.r.d.r.RuntimeJobRemoteProcessController@90] - Force stopping program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f based on program status
2022-09-22 18:17:13,501 - DEBUG [program.status:i.c.c.i.p.t.ProvisioningTask@86] - Created DEPROVISION task for program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f.
2022-09-22 18:17:13,514 - DEBUG [provisioning-task-3:i.c.c.i.p.t.ProvisioningTask@125] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.dataproc-pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.1754cda5-3aa2-11ed-9944-9edf9266a03f.

ameya0111 avatar Sep 20 '22 23:09 ameya0111

gitpod-io[bot] avatar Sep 20 '22 23:09 gitpod-io[bot]