chronos
chronos copied to clipboard
The task has "retries": 0 but still retries.
I add a very simple task when I test. The json I post is this.
{
"retries": 0,
"disable": false,
"command": "cd /var/log/tiger; pwd; hostname; date; exit -1",
"name": "test1",
"schedule": "R/2016-03-28T14:58:30.0+08:00/PT2M",
"description": "description test"
}
I think it will fail, wait 2 minutes, then try another time. But in fact, it tries almost once a second.
But some of my normal task, also with retries zero, works as my expect.
The version of Mesos is 0.28.0 The version of Chronos is 2.4.0
You scheduled the task to run every 2 minutes.
@Califax Yes, but the OP reports the job runs "almost once a second" in case of failure, and despite the fact retries is set to 0.
I don't know when will it happen. I tried to make it happen this afternoon, but it works right now.
Sorry I misread the fact it retries almost once a second on failure. I have not seen this behavior unless the job had a shorter repeat interval than the look ahead horizon and the job did have retries spawning multiple copies of the job. Given it is every 2 minutes, and you have 0 retries, this is something else. Let us know if you are able to reproduce.
My QA team has been able to reproduce the issue by doing the following:
-
Created a Synchronous job with:
Retry Count - 0 Repeat interval - 2 mins Repeat Count - 3
-
Set the Start time as
2016-07-06 16:10 America/Los_Angeles -
Job started running at scheduled time and got SUCCESS
-
Immediately disabled (not deleted) the Job (Before the second iteration starts) and changed it so future runs will fail. Then enabled it again.
-
Job immediately runs again and fails (as expected). But It was supposed to be run at 16:12 as the Repeat interval was mentioned as 2 mins. After this, it got run multiple times (50) rather the repeat count was just 3 and retry count was 0.
It seems that at step 5, job is retried every second until the next horizon minute is reached (step 4 took ~10 seconds to perform).
I've not been able to get retries to work as expected. I have "retries": 2, but I either get:
- No retries at all on failure
- Infinite retries on failure
Here's a job which was seeing infinite retries:
[
{
"name": "my-job",
"command": "/usr/local/deploy/bin/run_job 'do-that-thing'",
"shell": true,
"epsilon": "PT60S",
"executor": "",
"executorFlags": "",
"retries": 2,
"owner": "[email protected]",
"ownerName": "",
"description": "",
"async": false,
"successCount": 0,
"errorCount": 20,
"lastSuccess": "",
"lastError": "2017-11-07T22:46:09.345Z",
"cpus": 0.1,
"disk": 256,
"mem": 2048,
"disabled": false,
"softError": false,
"dataProcessingJobType": false,
"errorsSinceLastSuccess": 20,
"uris": [
"file:///etc/mesos/.dockercfg"
],
"environmentVariables": [],
"arguments": [],
"highPriority": false,
"runAsUser": "root",
"container": {
"type": "docker",
"image": "quay.io/my-containers/my-container:tag",
"network": "BRIDGE",
"volumes": [],
"forcePullImage": false
},
"constraints": [],
"schedule": "R//P1D",
"scheduleTimeZone": "America/Los_Angeles"
}
]