oshinko-s2i ephemeral clusters not getting deleted for jobs

ephemeral clusters not getting deleted for jobs

Open elmiko opened this issue 7 years ago • 1 comments

while testing the build and job workflows i've run into a situation where it appears that ephemeral clusters are not getting deleted even with the delete cluster option set to true.

steps to reproduce

oc new-project test
oc create -f https://radanalytics.io/resources.yaml
oc create -f pysparkbuild.json
oc create -f pysparkjob.json
oc new-app --template oshinko-pyspark-build -p GIT_URI=https://github.com/radanalyticsio/s2i-integration-test-apps
oc new-app --template oshinko-pyspark-job -p IMAGE=<Docker pull spec here>

observed result

the cluster created for the job is never cleaned, and the output seems to not recognize that it is ephemeral.

logs

18/01/04 16:07:04 INFO SparkContext: Invoking stop() from shutdown hook
18/01/04 16:07:04 INFO SparkUI: Stopped Spark web UI at http://172.17.0.2:4040
18/01/04 16:07:04 INFO StandaloneSchedulerBackend: Shutting down all executors
18/01/04 16:07:04 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
18/01/04 16:07:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/01/04 16:07:04 INFO MemoryStore: MemoryStore cleared
18/01/04 16:07:04 INFO BlockManager: BlockManager stopped
18/01/04 16:07:04 INFO BlockManagerMaster: BlockManagerMaster stopped
18/01/04 16:07:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/01/04 16:07:04 INFO SparkContext: Successfully stopped SparkContext
18/01/04 16:07:04 INFO ShutdownHookManager: Shutdown hook called
18/01/04 16:07:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-05af0c6c-9bb4-49a7-b62d-6a67b83fc749/pyspark-a5c3ba26-8a73-4e22-b314-602f07296267
18/01/04 16:07:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-05af0c6c-9bb4-49a7-b62d-6a67b83fc749
Deleting cluster 'cluster-efdcc4'
cluster is not ephemeral
cluster not deleted 'cluster-efdcc4'

the pods are never deleted,

$ oc get pods
NAME                       READY     STATUS      RESTARTS   AGE
cluster-efdcc4-m-1-l7xds   1/1       Running     0          12m
cluster-efdcc4-w-1-v7kmn   1/1       Running     0          12m
pyspark-m6va-cb82v         0/1       Completed   0          12m
pyspark-y8bl-1-build       0/1       Completed   0          30m

expected result

all cluster pods should be deleted after the job has completed.

possible cause

i think that the way the $ephemeral variable is being calculated in this function in the common start script is probably causing the issues here. it probably needs to account for jobs differently than deployments.

Jan 04 '18 16:01 elmiko

This is a known limitation, since ephemeral-ness is tracked via labels on deploymentconfigs.

We need another solution for jobs

Jan 04 '18 16:01 tmckayus

oshinko-s2i oshinko-s2i copied to clipboard

ephemeral clusters not getting deleted for jobs

steps to reproduce

observed result

expected result

possible cause

oshinko-s2i
oshinko-s2i copied to clipboard