SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

Spark pod not getting terminated after completion of spark job

Open Amey2400 opened this issue 3 years ago • 4 comments

Describe the bug Using SynpaseML package for applying LightGBM algorithm in a spark application. This spark application is running in the k8s environment. We have observed that even after the completion of all execution steps, the spark pod does not get terminated when the LightGBM package is used.

Expected behavior After completion of all the steps, all the threads should be terminated and the spark pod should get terminated.

Info (please complete the following information):

  • SynapseML Version: v0.9.4
  • SynapseML Package: com.microsoft.azure:synapseml_2.12:0.9.4
  • Spark Version 3.1.2
  • Spark Platform Amazon k8s

Amey2400 avatar Apr 04 '22 04:04 Amey2400

I experience the same problem with:

  • SynapseML Version: v0.9.5
  • Spark Version 3.2.0
  • Spark Platform on-premise k8s

The problem is not regular and happens from time to time.

fonhorst avatar Apr 14 '22 15:04 fonhorst

I have the problem also.

SynapseML Version: v0.9.5
Spark Version 3.2.0
Spark Platform on Google Cloud DataProc

Likely, it appears from time to time.

chengs avatar Apr 16 '22 03:04 chengs

Same problem leading to unnecessary costs when using auto-scaling:

  • SynapseML version: v0.9.5
  • Spark version 3.2.0
  • Azure Kubernetes Service (AKS)

fox-gamer avatar May 14 '22 16:05 fox-gamer

I don't know if related, but I have the same issue with sagemaker : I see the pyspark script finish, but the ProcessingJob times out

I am using a LightGBMRegressor, if that makes any difference.

  • SynapseML version: v0.9.5
  • Spark Version 3.1.
  • Sagemaker ProcessingJob

AlexandreOuellet avatar Jul 05 '22 18:07 AlexandreOuellet