marathon-lb
marathon-lb copied to clipboard
marathon-lb reload bug
Last week when we update a core service in our production environment(build with DC/OS). we accidentally make a mistake when change the health check configuration. and we get 503 return all the time from external access until we make health check configuration correctly and restart service . the old instance state is always healthy in marathon page. so we think something happened when marathon-lb reload.
why old healthy instance lose efficacy after we make a bad health check ?As we know nothing changed with old healthy instance when we lunch a new unhealthy instance in same application.
Test and Verification(marathon-lb version 1.12.1)
- a new nginx(listen 80) test application lunched(health check port 80)
- change health check port to 81 (marathon lunch a new instance and its state will never be healthy, at this time the nginx backend in haproxy.cfg has two different server)
- test external access
haproxy.cfg
before reload
backend nginx-lbl-test_10278
balance roundrobin
mode http
option forwardfor
http-request set-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto https if { ssl_fc }
server 10_168_0_82_9_0_5_7_80 9.0.5.7:80 check inter 5s fall 4 port 80
after reload
backend nginx-lbl-test_10278
balance roundrobin
mode http
option forwardfor
http-request set-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto https if { ssl_fc }
server 10_168_0_82_9_0_5_7_80 9.0.5.7:80 check inter 5s fall 4 port 81
server 10_168_0_82_9_0_5_12_80 9.0.5.12:80 check inter 5s fall 4 port 81
so why old instance health check configuration also has been updated?
It's terrible when we update some application in production environment. haproxy failover lose efficacy when you make a bad health check even the old healthy instance is still alive.