spring-cloud-deployer-kubernetes icon indicating copy to clipboard operation
spring-cloud-deployer-kubernetes copied to clipboard

Custom BackoffLimit & concurrencyPolicy for SCDF Tasks are not passed to PODS while executing in Openshift environment

Open ilayaperumalg opened this issue 4 years ago • 6 comments

@Srkanna commented on Sat Sep 19 2020

I'm trying to set a backoffLimit & concurrencyPolicy for batch jobs which are executed in Openshift environment via SCDF. Currently I'm setting these two at the global server config level. The resource limits, imagePullPolicy configurations are being passed to the CronJob but not backoffLimit and concurrencyPolicy.

I'm experiencing this in 2.6.1 and earlier versions as well. Below is the server-config.yaml.

  cloud:
    dataflow:
      task:
        platform:
          kubernetes:
            accounts:
              dev:
                limits:
                    memory: 1024Mi
                    cpu: 1
                entry-point-style: exec
                image-pull-policy: always
                backoffLimit: 1
                maxCrashLoopBackOffRestarts: 1
                concurrencyPolicy: forbid
  datasource:
    url: ${oracle-root-url}
    username: ${oracle-root-username}
    password: ${oracle-root-password}
    driver-class-name: oracle.jdbc.OracleDriver
    testOnBorrow: true
    validationQuery: "SELECT 1"
  flyway:
    enabled: false
  jpa:
    hibernate:
      use-new-id-generator-mappings: true

Both backoffLimit and maxCrashLoopBackOffRestarts are not passed to POD configuration. I still see PODS are getting restarted 6 times instead of 1 time after a failure. Below is the CronJob.yaml which I extracted from the Openshift cluster console after creating the schedule in SCDF for a batch job.

kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: batchjob1
  namespace: dev-batch
  selfLink: /apis/batch/v1beta1/namespaces/dev-batch/cronjobs/batchjob1
  uid: bef709dc-fa3a-11ea-933e-001a4a1a0116
  resourceVersion: '144552724'
  creationTimestamp: '2020-09-19T05:41:20Z'
  labels:
    spring-cronjob-id: batchjob1
spec:
  schedule: '*/10 * * * *'
  concurrencyPolicy: Allow
  suspend: false
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
            - name: batchjob1
              image: >-
                docker-registry.default.svc:5000/batch/batch-job:0.0.4
              args:
                - '--spring.datasource.username=BATCH_APP'
                - '--spring.cloud.task.name=batchjob1'
                - >-
                  --spring.datasource.url=jdbc:oracle:thin:@URL
                - '--spring.datasource.driverClassName=oracle.jdbc.OracleDriver'
                - '--spring.datasource.password=password'
                - '--spring.batch.job.names=Job1'
              env:
                - name: SPRING_CLOUD_APPLICATION_GUID
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: metadata.uid
              resources:
                limits:
                  cpu: '1'
                  memory: 1Gi
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              imagePullPolicy: Always
          restartPolicy: Never
          terminationGracePeriodSeconds: 30
          dnsPolicy: ClusterFirst
          serviceAccountName: default
          serviceAccount: default
          securityContext: {}
          schedulerName: default-scheduler
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
status: {}

Kindly let me know your inputs. @ilayaperumalg @sabbyanandan


@ilayaperumalg commented on Mon Sep 21 2020

Hi @Srkanna,

This looks like a bug. Moving this to Spring Cloud Deployer Kubernetes. Thanks for reporting.

ilayaperumalg avatar Sep 21 '20 11:09 ilayaperumalg

looks more an unimplemented use case than bug. also concurrencyPolicy currently isn't a supported property. supported deployer properties can be found in the corresponding documentation, for example:

https://docs.spring.io/spring-cloud-dataflow/docs/2.6.1/reference/htmlsingle/#configuration-kubernetes-deployer

chrisjs avatar Sep 21 '20 12:09 chrisjs

I can add the concurrencyPolicy directly to the cronjob.yaml created in Openshift environment when a scheduled task is being submitted. However I couldn't do the same for backoffPolicy. Is there any workaround we have for setting backoff limit ? @chrisjs @ilayaperumalg

Srkanna avatar Sep 30 '20 10:09 Srkanna

please provide

  • reproducible steps to create a scheduled job in the same way you are

  • the changes you are trying to make when editing object(s) directly

chrisjs avatar Sep 30 '20 12:09 chrisjs

I cannot edit the actual post as it's not created by me. Hence posting the steps to reproduce.

  1. SCDF Installation & Task creation: I followed the steps provided in link https://dataflow.spring.io/docs/installation/kubernetes/kubectl. you can use the same batch job used in the documentation(Link here). Use kubernetes Version of the documentation.

Now once the SCDF deployed in openshift environment, I imported the batch job application as Docker Image using Docker URI instead of maven repo URL. I actually built my batch application in openshift environment and used that Docker URI.

Now I can schedule the Batch job available and the request is submitted to openshift environment. Below is the config-map used by the tasks.

cloud:
    dataflow:
      task:
        platform:
          kubernetes:
            accounts:
              dev:
                limits:
                    memory: 1024Mi
                    cpu: 1
                entry-point-style: exec
                image-pull-policy: always
                backoffLimit: 1
                maxCrashLoopBackOffRestarts: 1
                concurrencyPolicy: forbid
  datasource:
    url: ${oracle-root-url}
    username: ${oracle-root-username}
    password: ${oracle-root-password}
    driver-class-name: oracle.jdbc.OracleDriver
    testOnBorrow: true
    validationQuery: "SELECT 1"
  flyway:
    enabled: false
  jpa:
    hibernate:
      use-new-id-generator-mappings: true

the image-pull-policy property is getting transported to openshift. But not,

                backoffLimit: 1
                maxCrashLoopBackOffRestarts: 1
                concurrencyPolicy: forbid

I found this is not transported to openshift by going to cluster console -> Workloads ->Cronjobs. Then choose the corresponding openshift project from the dropdown list available at the top. 1) image

image

In the image above I could edit the yaml for any job we scheduled in scdf. It also accepts properties like concurrencyPolicy: forbid. However I don't find any property for backoffLimit.

our jobs would run at very close intervals. Mostly between 2-3 minutes. So in case if a job fails then the pod is getting created for 6 times, and it takes more than 5 minutes for all of them to complete. In the meantime the next scheduled execution also starts and fails which creates another 6 pods . This exhausts the resource in no time.

So It would be great if there's any such property to limit pod creation on failure.

Srkanna avatar Oct 05 '20 17:10 Srkanna

looks like there are a couple things here.. i don't have openshift to test against but that's likely not a concern. i've made an attempt to reproduce what your seeing and made the notes below as long as opened some issues to track.

1 - the kubernetes property concurrencyPolicy is currently not implemented in the deployer. i have opened an enhancement issue to do so which is located at: https://github.com/spring-cloud/spring-cloud-deployer-kubernetes/issues/406. please feel free to contribute if you can.

2 - the deployer property maxCrashLoopBackOffRestarts is used only by the deployer for state checking when a container is in CrashLoopBackOff state. this is not a kubernetes property nor does it get set on any pod, job, etc. possibly there is a kubernetes property you would like to use that has the similar intended functionality. if the desired property is not currently supported by the deployer, we can have that open as a separate enhancement issue

3 - in regards to backoffLimit there are two things at play here:

a) when a task is created by default, a "bare pod" is used unless you set the deployer property createJob. this will result in the tasks being run in a kubernetes Job rather than a pod. the backoffLimit property applies to Job's so you need to enable that, noting createJob: true:

              dev:
                limits:
                    memory: 1024Mi
                    cpu: 1
                createJob: true
                backoffLimit: 1

while the above should work, in the current state, the backoffLimit property when set though the ConfigMap is not being passed through from data flow to the deployer properly. i have opened an issue for that located here: https://github.com/spring-cloud/spring-cloud-dataflow/issues/4186

b) to work around backoffLimit not being passed correctly via the configmap, you can set a deployer property when you launch each task, for example:

task create --name t2 --definition "timestamp" 
task launch --name t2 --properties "deployer.timestamp.kubernetes.backoffLimit=1"

results in the following objects:

the job:

job.batch/t2-ddopmx83lp   0/1           45s        45s

the spawned pod:

pod/t2-ddopmx83lp-88b7q            1/1     Running   0          45s

when inspecting the job.batch/t2-ddopmx83lp object, you would then find the backoffLimit property set, ie:

   backoffLimit: 1

4 - when scheduling a task, setting the backoffLimit on a CronJob object is not currently implemented - i have opened an enhancement issue here: https://github.com/spring-cloud/spring-cloud-deployer-kubernetes/issues/407

I think only https://github.com/spring-cloud/spring-cloud-dataflow/issues/4186 needs to be resolved to close this issue as its the only "bug". the others are logged feature enhancements or incorrect property usage.

chrisjs avatar Oct 07 '20 18:10 chrisjs

It's seems that just add in class KubernetesDeployerProperty

private int backoffLimit = 0;

public int getBackoffLimit() {
	return backoffLimit;
}

public void setBackoffLimit(int backoffLimit) {
	this.backoffLimit = backoffLimit;
}

and in KubernetesScheduler: cronJob.getSpec().getJobTemplate().getSpec().setBackoffLimit(properties.getBackoffLimit());

and it should works. Anyone?

szopal avatar May 06 '22 13:05 szopal

I'm not able to make this work. I tried to create a scheduler from the scdf UI, once with spring.cloud.deployer.kubernetes.backoffLimit=1 as argument and once with deployer.kubernetes.backoffLimit=1 as property, but in both cases they are not taken into account.

Any suggestion? Tnx

saugion avatar Mar 30 '23 13:03 saugion

Hi @saulgiordani

In the "Launch Task" screen in the UI, do you see backoffLimit as an option in Deployment Platform -> Properties -> Edit ? If not, what options do you see in there?

The screenshot below is for the "local" (not "kubernetes" platform) Screen Shot 2023-03-30 at 09 16 49

Try instead using the property deployer.<your-app-name>.kubernetes.backoffLimit=<your-backoff-limit>. The format is deployer.<app>.<platform>.<property-path>=<property-value>.

If you choose "Free text" in the "Launch Task -> Deployment Platform -> Properties" screen you can see how the UI sets the properties.

Screen Shot 2023-03-30 at 09 27 18

onobc avatar Mar 30 '23 14:03 onobc

Hi @onobc, i'm trying to add the backoffLimit from the schedules view, not the launch view Screenshot 2023-03-30 at 16 51 45

saugion avatar Mar 30 '23 14:03 saugion

Yeh, my bad on the screens @saulgiordani - you are in the scheduler.

Still, try my suggestion of instead using the property deployer.<your-app-name>.kubernetes.backoffLimit=<your-backoff-limit>.

Note, the scheduler properties ("spring.cloud.scheduler.kubernetes") are deprecated and have been replaced w/ the deployer properties ("spring.cloud.deployer.kubernetes") - although the code still handles both.

onobc avatar Mar 30 '23 15:03 onobc

Hi @onobc, i've tried with the following parameters in the properties text area:

  • scheduler.kubernetes.taskServiceAccountName=scdf-sa
  • deployer.my_app_name.kubernetes.backoffLimit=0 (also tried with deployer.my_app_name.kubernetes.backoff-limit=0 and scheduler.deployer.kubernetes.backoff-limit=0 with no luck)

The 1st property is taken correctly (if I use deployer instead of scheduler IS NOT TAKEN), the 2nd is not.

saugion avatar Mar 31 '23 07:03 saugion

It seems that it's not possible. BackoffLimit is not exposed in KubernetesScheduler: https://github.com/spring-cloud/spring-cloud-deployer/blob/main/spring-cloud-deployer-kubernetes/src/main/java/org/springframework/cloud/deployer/spi/kubernetes/KubernetesScheduler.java#L237 However, KubernetesTaskLauncher exposes it

imitbn avatar Apr 26 '23 08:04 imitbn

Good catch @imitbn ,

I will add this to the KubernetesScheduler as well @saulgiordani .

onobc avatar Apr 26 '23 16:04 onobc

Good catch @imitbn ,

I will add this to the KubernetesScheduler as well @saulgiordani .

Great, thanks!

saugion avatar Apr 26 '23 16:04 saugion

If all goes well, we can get it squeezed into 2.10.3 which is planned to release in a few days.

onobc avatar Apr 26 '23 16:04 onobc

Closing this in favor of #407 as I think everything else besides that is done in this issue.

onobc avatar Apr 26 '23 17:04 onobc

Hi,

I still cannot schedule task with backoffLimit = 0. I put this on my scdf application.yml file: image

I see that openshift cronjobs are still generated without backoffLimit property.

What else I have to do to let it works?

Thank you.

f

fgapito avatar May 29 '23 09:05 fgapito

Hi, the following is working fine for me: apiVersion: batch/v1 kind: CronJob metadata: creationTimestamp: "2023-03-30T14:02:08Z" generation: 2 labels: spring-cronjob-id: ewd-conversor name: ewd-test namespace: x2 resourceVersion: "1515110" uid: d240493a-84de-45ba-8de9-417849036152 spec: concurrencyPolicy: Allow failedJobsHistoryLimit: 1 jobTemplate: metadata: creationTimestamp: null spec: backoffLimit: 1

If you want to set it through the scheduler, put this as property: scheduler.kubernetes.cron.backoffLimit=0

and you will see that the pod definition includes the backoffiLimit

saugion avatar May 29 '23 12:05 saugion

Thank you, but as far as I understood this scheduler.kubernetes.cron.backoffLimit has been deprecated, isn't so?

where should I put this scheduler.kubernetes.cron.backoffLimit=0?

EDIT: it works if I put that here: image

fgapito avatar May 29 '23 12:05 fgapito

That's the way, this behaviour has actually been added in the scdf 2.10.3

saugion avatar May 29 '23 12:05 saugion