spring-cloud-deployer-kubernetes
spring-cloud-deployer-kubernetes copied to clipboard
Custom BackoffLimit & concurrencyPolicy for SCDF Tasks are not passed to PODS while executing in Openshift environment
@Srkanna commented on Sat Sep 19 2020
I'm trying to set a backoffLimit & concurrencyPolicy for batch jobs which are executed in Openshift environment via SCDF. Currently I'm setting these two at the global server config level. The resource limits, imagePullPolicy configurations are being passed to the CronJob but not backoffLimit and concurrencyPolicy.
I'm experiencing this in 2.6.1 and earlier versions as well. Below is the server-config.yaml.
cloud:
dataflow:
task:
platform:
kubernetes:
accounts:
dev:
limits:
memory: 1024Mi
cpu: 1
entry-point-style: exec
image-pull-policy: always
backoffLimit: 1
maxCrashLoopBackOffRestarts: 1
concurrencyPolicy: forbid
datasource:
url: ${oracle-root-url}
username: ${oracle-root-username}
password: ${oracle-root-password}
driver-class-name: oracle.jdbc.OracleDriver
testOnBorrow: true
validationQuery: "SELECT 1"
flyway:
enabled: false
jpa:
hibernate:
use-new-id-generator-mappings: true
Both backoffLimit and maxCrashLoopBackOffRestarts are not passed to POD configuration. I still see PODS are getting restarted 6 times instead of 1 time after a failure. Below is the CronJob.yaml which I extracted from the Openshift cluster console after creating the schedule in SCDF for a batch job.
kind: CronJob
apiVersion: batch/v1beta1
metadata:
name: batchjob1
namespace: dev-batch
selfLink: /apis/batch/v1beta1/namespaces/dev-batch/cronjobs/batchjob1
uid: bef709dc-fa3a-11ea-933e-001a4a1a0116
resourceVersion: '144552724'
creationTimestamp: '2020-09-19T05:41:20Z'
labels:
spring-cronjob-id: batchjob1
spec:
schedule: '*/10 * * * *'
concurrencyPolicy: Allow
suspend: false
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
spec:
containers:
- name: batchjob1
image: >-
docker-registry.default.svc:5000/batch/batch-job:0.0.4
args:
- '--spring.datasource.username=BATCH_APP'
- '--spring.cloud.task.name=batchjob1'
- >-
--spring.datasource.url=jdbc:oracle:thin:@URL
- '--spring.datasource.driverClassName=oracle.jdbc.OracleDriver'
- '--spring.datasource.password=password'
- '--spring.batch.job.names=Job1'
env:
- name: SPRING_CLOUD_APPLICATION_GUID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
resources:
limits:
cpu: '1'
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Never
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: default
serviceAccount: default
securityContext: {}
schedulerName: default-scheduler
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
status: {}
Kindly let me know your inputs. @ilayaperumalg @sabbyanandan
@ilayaperumalg commented on Mon Sep 21 2020
Hi @Srkanna,
This looks like a bug. Moving this to Spring Cloud Deployer Kubernetes. Thanks for reporting.
looks more an unimplemented use case than bug. also concurrencyPolicy
currently isn't a supported property. supported deployer properties can be found in the corresponding documentation, for example:
https://docs.spring.io/spring-cloud-dataflow/docs/2.6.1/reference/htmlsingle/#configuration-kubernetes-deployer
I can add the concurrencyPolicy directly to the cronjob.yaml created in Openshift environment when a scheduled task is being submitted. However I couldn't do the same for backoffPolicy. Is there any workaround we have for setting backoff limit ? @chrisjs @ilayaperumalg
please provide
-
reproducible steps to create a scheduled job in the same way you are
-
the changes you are trying to make when editing object(s) directly
I cannot edit the actual post as it's not created by me. Hence posting the steps to reproduce.
- SCDF Installation & Task creation: I followed the steps provided in link https://dataflow.spring.io/docs/installation/kubernetes/kubectl. you can use the same batch job used in the documentation(Link here). Use kubernetes Version of the documentation.
Now once the SCDF deployed in openshift environment, I imported the batch job application as Docker Image using Docker URI instead of maven repo URL. I actually built my batch application in openshift environment and used that Docker URI.
Now I can schedule the Batch job available and the request is submitted to openshift environment. Below is the config-map used by the tasks.
cloud:
dataflow:
task:
platform:
kubernetes:
accounts:
dev:
limits:
memory: 1024Mi
cpu: 1
entry-point-style: exec
image-pull-policy: always
backoffLimit: 1
maxCrashLoopBackOffRestarts: 1
concurrencyPolicy: forbid
datasource:
url: ${oracle-root-url}
username: ${oracle-root-username}
password: ${oracle-root-password}
driver-class-name: oracle.jdbc.OracleDriver
testOnBorrow: true
validationQuery: "SELECT 1"
flyway:
enabled: false
jpa:
hibernate:
use-new-id-generator-mappings: true
the image-pull-policy property is getting transported to openshift. But not,
backoffLimit: 1
maxCrashLoopBackOffRestarts: 1
concurrencyPolicy: forbid
I found this is not transported to openshift by going to cluster console -> Workloads ->Cronjobs. Then choose the corresponding openshift project from the dropdown list available at the top.
1)
In the image above I could edit the yaml for any job we scheduled in scdf. It also accepts properties like concurrencyPolicy: forbid. However I don't find any property for backoffLimit.
our jobs would run at very close intervals. Mostly between 2-3 minutes. So in case if a job fails then the pod is getting created for 6 times, and it takes more than 5 minutes for all of them to complete. In the meantime the next scheduled execution also starts and fails which creates another 6 pods . This exhausts the resource in no time.
So It would be great if there's any such property to limit pod creation on failure.
looks like there are a couple things here.. i don't have openshift to test against but that's likely not a concern. i've made an attempt to reproduce what your seeing and made the notes below as long as opened some issues to track.
1 - the kubernetes property concurrencyPolicy
is currently not implemented in the deployer. i have opened an enhancement issue to do so which is located at: https://github.com/spring-cloud/spring-cloud-deployer-kubernetes/issues/406. please feel free to contribute if you can.
2 - the deployer property maxCrashLoopBackOffRestarts
is used only by the deployer for state checking when a container is in CrashLoopBackOff
state. this is not a kubernetes property nor does it get set on any pod, job, etc. possibly there is a kubernetes property you would like to use that has the similar intended functionality. if the desired property is not currently supported by the deployer, we can have that open as a separate enhancement issue
3 - in regards to backoffLimit
there are two things at play here:
a) when a task is created by default, a "bare pod" is used unless you set the deployer property createJob
. this will result in the tasks being run in a kubernetes Job
rather than a pod
. the backoffLimit
property applies to Job
's so you need to enable that, noting createJob: true
:
dev:
limits:
memory: 1024Mi
cpu: 1
createJob: true
backoffLimit: 1
while the above should work, in the current state, the backoffLimit
property when set though the ConfigMap is not being passed through from data flow to the deployer properly. i have opened an issue for that located here: https://github.com/spring-cloud/spring-cloud-dataflow/issues/4186
b) to work around backoffLimit
not being passed correctly via the configmap, you can set a deployer property when you launch each task, for example:
task create --name t2 --definition "timestamp"
task launch --name t2 --properties "deployer.timestamp.kubernetes.backoffLimit=1"
results in the following objects:
the job:
job.batch/t2-ddopmx83lp 0/1 45s 45s
the spawned pod:
pod/t2-ddopmx83lp-88b7q 1/1 Running 0 45s
when inspecting the job.batch/t2-ddopmx83lp
object, you would then find the backoffLimit
property set, ie:
backoffLimit: 1
4 - when scheduling a task, setting the backoffLimit on a CronJob
object is not currently implemented - i have opened an enhancement issue here: https://github.com/spring-cloud/spring-cloud-deployer-kubernetes/issues/407
I think only https://github.com/spring-cloud/spring-cloud-dataflow/issues/4186 needs to be resolved to close this issue as its the only "bug". the others are logged feature enhancements or incorrect property usage.
It's seems that just add in class KubernetesDeployerProperty
private int backoffLimit = 0;
public int getBackoffLimit() {
return backoffLimit;
}
public void setBackoffLimit(int backoffLimit) {
this.backoffLimit = backoffLimit;
}
and in KubernetesScheduler:
cronJob.getSpec().getJobTemplate().getSpec().setBackoffLimit(properties.getBackoffLimit());
and it should works. Anyone?
I'm not able to make this work. I tried to create a scheduler from the scdf UI, once with spring.cloud.deployer.kubernetes.backoffLimit=1 as argument and once with deployer.kubernetes.backoffLimit=1 as property, but in both cases they are not taken into account.
Any suggestion? Tnx
Hi @saulgiordani
In the "Launch Task" screen in the UI, do you see backoffLimit
as an option in Deployment Platform -> Properties -> Edit
? If not, what options do you see in there?
The screenshot below is for the "local" (not "kubernetes" platform)
Try instead using the property deployer.<your-app-name>.kubernetes.backoffLimit=<your-backoff-limit>
.
The format is deployer.<app>.<platform>.<property-path>=<property-value>
.
If you choose "Free text" in the "Launch Task -> Deployment Platform -> Properties" screen you can see how the UI sets the properties.
data:image/s3,"s3://crabby-images/1a6ca/1a6ca5078a1a220d9890c26c25cd9c4cabaf8ebc" alt="Screen Shot 2023-03-30 at 09 27 18"
Hi @onobc, i'm trying to add the backoffLimit from the schedules view, not the launch view
Yeh, my bad on the screens @saulgiordani - you are in the scheduler.
Still, try my suggestion of instead using the property deployer.<your-app-name>.kubernetes.backoffLimit=<your-backoff-limit>
.
Note, the scheduler properties ("spring.cloud.scheduler.kubernetes"
) are deprecated and have been replaced w/ the deployer properties ("spring.cloud.deployer.kubernetes"
) - although the code still handles both.
Hi @onobc, i've tried with the following parameters in the properties text area:
-
scheduler.kubernetes.taskServiceAccountName=scdf-sa
-
deployer.my_app_name.kubernetes.backoffLimit=0
(also tried withdeployer.my_app_name.kubernetes.backoff-limit=0
andscheduler.deployer.kubernetes.backoff-limit=0
with no luck)
The 1st property is taken correctly (if I use deployer instead of scheduler IS NOT TAKEN), the 2nd is not.
It seems that it's not possible. BackoffLimit is not exposed in KubernetesScheduler: https://github.com/spring-cloud/spring-cloud-deployer/blob/main/spring-cloud-deployer-kubernetes/src/main/java/org/springframework/cloud/deployer/spi/kubernetes/KubernetesScheduler.java#L237 However, KubernetesTaskLauncher exposes it
Good catch @imitbn ,
I will add this to the KubernetesScheduler as well @saulgiordani .
Good catch @imitbn ,
I will add this to the KubernetesScheduler as well @saulgiordani .
Great, thanks!
If all goes well, we can get it squeezed into 2.10.3 which is planned to release in a few days.
Closing this in favor of #407 as I think everything else besides that is done in this issue.
Hi,
I still cannot schedule task with backoffLimit = 0. I put this on my scdf application.yml file:
I see that openshift cronjobs are still generated without backoffLimit property.
What else I have to do to let it works?
Thank you.
f
Hi, the following is working fine for me:
apiVersion: batch/v1 kind: CronJob metadata: creationTimestamp: "2023-03-30T14:02:08Z" generation: 2 labels: spring-cronjob-id: ewd-conversor name: ewd-test namespace: x2 resourceVersion: "1515110" uid: d240493a-84de-45ba-8de9-417849036152 spec: concurrencyPolicy: Allow failedJobsHistoryLimit: 1 jobTemplate: metadata: creationTimestamp: null spec: backoffLimit: 1
If you want to set it through the scheduler, put this as property:
scheduler.kubernetes.cron.backoffLimit=0
and you will see that the pod definition includes the backoffiLimit
Thank you, but as far as I understood this scheduler.kubernetes.cron.backoffLimit has been deprecated, isn't so?
where should I put this scheduler.kubernetes.cron.backoffLimit=0?
EDIT: it works if I put that here:
That's the way, this behaviour has actually been added in the scdf 2.10.3