levant
levant copied to clipboard
canary-auto-promote not working correctly
Levant canary-auto-promote flag does not appear to be working. Successfully deployed canary and Deployment is still running and requires manual promotion. Are my nomad/server/update default configuration preventing canary from auto-promoting ?
$ levant deploy -canary-auto-promote=15 nomad/nginx.nomad
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value dc1
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 1
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value true
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 1
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 1m
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 1
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 15s
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 0
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 2m
2018-10-29T19:16:38Z |INFO| template/funcs: using Consul KV variable with key service/nginx/image and value nginx:latest
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 20
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 10
2018-10-29T19:16:38Z |INFO| template/funcs: using default Consul KV variable with value 1
2018-10-29T19:16:38Z |INFO| levant/plan: group server and task nginx plan indicates change of Config:image from nginx:stable-alpine to nginx:latest
2018-10-29T19:16:38Z |INFO| levant/deploy: using dynamic count 1 for group server job_id=nginx
2018-10-29T19:16:38Z |INFO| levant/deploy: triggering a deployment job_id=nginx
2018-10-29T19:16:38Z |INFO| levant/deploy: evaluation 75fa79dc-97c9-aba0-ec0c-4b3d951886fd finished successfully job_id=nginx
2018-10-29T19:16:38Z |INFO| levant/deploy: job is not configured with update stanza, consider adding to use deployments job_id=nginx
2018-10-29T19:16:38Z |INFO| levant/job_status_checker: job has status running job_id=nginx
2018-10-29T19:16:38Z |INFO| levant/job_status_checker: all allocations in deployment of job are running job_id=nginx
2018-10-29T19:16:38Z |INFO| levant/deploy: job deployment successful job_id=nginx
$ nomad status nginx
...
Latest Deployment
ID = 6785aa24
Status = running
Description = Deployment is running but requires promotion
Deployed
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy
server true false 1 1 1 1 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
e9e9e3ae 03ef581e server 2 run running 14m38s ago 14m12s ago
75924020 03ef581e server 1 run running 1h8m ago 1h8m ago
Relevant Nomad job specification file
job "nginx" {
region = "global"
datacenters = [ "[[ consulKeyOrDefault "service/nginx/datacenters" "dc1" ]]" ]
type = "service"
group "server" {
count = [[ consulKeyOrDefault "service/nginx/server/count" "1" ]]
update {
auto_revert = [[ consulKeyOrDefault "service/nginx/server/update/auto_revert" "true" ]]
canary = [[ consulKeyOrDefault "service/nginx/server/update/canary" "1" ]]
# deadline before automatically transitioned to unhealthy
healthy_deadline = "[[ consulKeyOrDefault "service/nginx/server/update/healthy_deadline" "1m" ]]"
max_parallel = [[ consulKeyOrDefault "service/nginx/server/update/max_parallel" "1" ]]
# must be in healthy state before it is marked as healthy
min_healthy_time = "[[ consulKeyOrDefault "service/nginx/server/update/min_healthy_time" "15s" ]]"
# the first to be marked as unhealthy causes the deployment to fail
progress_deadline = "[[ consulKeyOrDefault "service/nginx/server/update/min_healthy_time" "0" ]]"
# delay between migrating allocations off nodes marked for draining
stagger = "[[ consulKeyOrDefault "service/nginx/update/server/update/stagger" "2m" ]]"
}
task "nginx" {
driver = "docker"
config {
image = "[[ consulKeyOrDefault "service/nginx/image" "nginx:stable-alpine" ]]"
dns_servers = ["169.254.1.1"]
port_map = {
http = 80
}
}
service {
port = "http"
name = "canary-nginx"
canary_tags = [
"traefik.enable=true",
]
}
service {
port = "http"
name = "nginx"
tags = [
"traefik.enable=true",
]
}
resources {
cpu = [[ consulKeyOrDefault "service/nginx/resources/cpu" "20" ]]
memory = [[ consulKeyOrDefault "service/nginx/resources/memory" "10" ]]
network {
mbits = [[ consulKeyOrDefault "service/nginx/resources/network/mbits" "1" ]]
port "http" {
}
}
}
}
}
}
Output of levant version
:
Levant v0.2.5
Date: 2018-10-25T13:24:11Z
Commit: 0514741514e70caf82976c2c67f98414046b2392
Branch: 0.2.5
State: 0.2.5
Summary: 0514741514e70caf82976c2c67f98414046b2392
Output of consul version
:
Consul v1.4.0-rc1 (1757fbc0a)
Output of nomad version
:
Nomad v0.8.6 (ab54ebcfcde062e9482558b7c052702d4cb8aa1b+CHANGES)
Additional environment details:
Debug log outputs from Levant:
thanks for the detailed report @vvitayau, i'll take a look into this when I can which will hopefully be soon and get back to you.
We are experiencing this as well. It appears to happen when the update
clause is specified at the group
level rather than at the job
level (which is valid). This seems to be the offending line of code: https://github.com/jrasell/levant/blob/0514741514e70caf82976c2c67f98414046b2392/levant/deploy.go#L151 Workaround is to move the update stanza to the job level, which is acceptable if the different groups all need similar update configurations.