chronos The task has "retries": 0 but still retries.

I add a very simple task when I test. The json I post is this.

{
  "retries": 0,
  "disable": false,
  "command": "cd /var/log/tiger; pwd; hostname; date; exit -1",
  "name": "test1",
  "schedule": "R/2016-03-28T14:58:30.0+08:00/PT2M",
  "description": "description test"
}

I think it will fail, wait 2 minutes, then try another time. But in fact, it tries almost once a second.

But some of my normal task, also with retries zero, works as my expect.

The version of Mesos is 0.28.0 The version of Chronos is 2.4.0

Mar 28 '16 11:03 jinzhao1994

You scheduled the task to run every 2 minutes.

Mar 29 '16 22:03 Califax

@Califax Yes, but the OP reports the job runs "almost once a second" in case of failure, and despite the fact retries is set to 0.

Mar 31 '16 15:03 ddossot

I don't know when will it happen. I tried to make it happen this afternoon, but it works right now.

Mar 31 '16 16:03 jinzhao1994

Sorry I misread the fact it retries almost once a second on failure. I have not seen this behavior unless the job had a shorter repeat interval than the look ahead horizon and the job did have retries spawning multiple copies of the job. Given it is every 2 minutes, and you have 0 retries, this is something else. Let us know if you are able to reproduce.

Apr 01 '16 15:04 Califax

My QA team has been able to reproduce the issue by doing the following:

Created a Synchronous job with:

Retry Count - 0 Repeat interval - 2 mins Repeat Count - 3
Set the Start time as 2016-07-06 16:10 America/Los_Angeles
Job started running at scheduled time and got SUCCESS
Immediately disabled (not deleted) the Job (Before the second iteration starts) and changed it so future runs will fail. Then enabled it again.
Job immediately runs again and fails (as expected). But It was supposed to be run at 16:12 as the Repeat interval was mentioned as 2 mins. After this, it got run multiple times (50) rather the repeat count was just 3 and retry count was 0.

It seems that at step 5, job is retried every second until the next horizon minute is reached (step 4 took ~10 seconds to perform).

Jul 07 '16 18:07 dandew

I've not been able to get retries to work as expected. I have "retries": 2, but I either get:

No retries at all on failure
Infinite retries on failure

Here's a job which was seeing infinite retries:

[
  {
    "name": "my-job",
    "command": "/usr/local/deploy/bin/run_job 'do-that-thing'",
    "shell": true,
    "epsilon": "PT60S",
    "executor": "",
    "executorFlags": "",
    "retries": 2,
    "owner": "[email protected]",
    "ownerName": "",
    "description": "",
    "async": false,
    "successCount": 0,
    "errorCount": 20,
    "lastSuccess": "",
    "lastError": "2017-11-07T22:46:09.345Z",
    "cpus": 0.1,
    "disk": 256,
    "mem": 2048,
    "disabled": false,
    "softError": false,
    "dataProcessingJobType": false,
    "errorsSinceLastSuccess": 20,
    "uris": [
      "file:///etc/mesos/.dockercfg"
    ],
    "environmentVariables": [],
    "arguments": [],
    "highPriority": false,
    "runAsUser": "root",
    "container": {
      "type": "docker",
      "image": "quay.io/my-containers/my-container:tag",
      "network": "BRIDGE",
      "volumes": [],
      "forcePullImage": false
    },
    "constraints": [],
    "schedule": "R//P1D",
    "scheduleTimeZone": "America/Los_Angeles"
  }
]

Nov 07 '17 23:11 deanmorin

chronos chronos copied to clipboard

The task has "retries": 0 but still retries.

chronos
chronos copied to clipboard