Upgrade Issue with bringing in kube-registry-proxy
In order to switch over from our in-house registry-proxy to the official/upstream kube-registry-proxy (as original PR https://github.com/deis/workflow/pull/734 proposed) we will need to sort out the following issue when upgrading.
v2.12.0 release candidate testing showed that after a Workflow install that uses the in-house variant of deis-registry-proxy (say, v2.11.0), when one goes to upgrade (helm upgrade luminous-hummingbird workflow-staging/workflow --version v2.12.0), although the deis-registry-proxy pod appears to have been removed, the new luminous-hummingbird-kube-registry-proxy sometimes does not appear due to a host port conflict:
$ helm ls
NAME REVISION UPDATED STATUS CHART NAMESPACE
luminous-hummingbird 4 Wed Mar 8 14:01:02 2017 DEPLOYED workflow-v2.12.0 deis
$ kd get po,ds
NAME READY STATUS RESTARTS AGE
po/deis-builder-574483744-qnf44 1/1 Running 0 24m
po/deis-controller-3953262871-jqkmd 1/1 Running 2 24m
po/deis-database-83844344-m5x4x 1/1 Running 0 24m
po/deis-logger-176328999-d7fxc 1/1 Running 9 1h
po/deis-logger-fluentd-0hqfs 1/1 Running 0 1h
po/deis-logger-fluentd-drfh6 1/1 Running 0 1h
po/deis-logger-redis-304849759-nbrdp 1/1 Running 0 1h
po/deis-minio-676004970-g2bj9 1/1 Running 0 1h
po/deis-monitor-grafana-432627134-87b1z 1/1 Running 0 24m
po/deis-monitor-influxdb-2729788615-q67f9 1/1 Running 0 25m
po/deis-monitor-telegraf-6q562 1/1 Running 0 1h
po/deis-monitor-telegraf-rzwnv 1/1 Running 6 1h
po/deis-nsqd-3597503299-94nhx 1/1 Running 0 1h
po/deis-registry-756475849-v0rmw 1/1 Running 0 24m
po/deis-router-1001573613-mk07g 1/1 Running 0 13m
po/deis-workflow-manager-1013677227-kh5vt 1/1 Running 0 25m
NAME DESIRED CURRENT READY NODE-SELECTOR AGE
ds/deis-logger-fluentd 2 2 2 <none> 1h
ds/deis-monitor-telegraf 2 2 2 <none> 1h
ds/luminous-hummingbird-kube-registry-proxy 0 0 0 <none> 24m
$ kd describe ds luminous-hummingbird-kube-registry-proxy
Name: luminous-hummingbird-kube-registry-proxy
Image(s): gcr.io/google_containers/kube-registry-proxy:0.4
Selector: app=luminous-hummingbird-kube-registry-proxy
Node-Selector: <none>
Labels: chart=kube-registry-proxy-0.1.0
heritage=Tiller
release=luminous-hummingbird
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
25m 25m 2 {daemonset-controller } Normal FailedPlacement failed to place pod on "k8s-agent-fbf26383-0": host port conflict
25m 25m 2 {daemonset-controller } Normal FailedPlacement failed to place pod on "k8s-master-fbf26383-0": host port conflict
let's see if we can distill this into a base case which we can hopefully ship a PR and functional test upstream to helm.
It is a possibility that this is due to a k8s regression (been running v1.5.x in my testing); perhaps related: https://github.com/kubernetes/kubernetes/issues/23013
Adding this to the v2.15 milestone. We'll want to re-try this on a v1.6.x cluster. As it stands, we've added deis/registry-proxy back into CI as features have come in with the Workflow v2.14 milestone.
This issue was moved to teamhephy/workflow#27