pangeo-cloud-federation hub resource requests not being over-written by config/common.yaml

We are squeezing all hub and core pods onto a single m5.large instance for the nasa deployment, but some pods aren't scheduling b/c we're requesting over 100% CPU. It seems like there might be a bug in overwriting the hub resource requests here:

https://github.com/pangeo-data/pangeo-cloud-federation/blob/074777d914baaf61669d935967cd6647625ae8bb/deployments/nasa/config/common.yaml#L88-L95

Because the pod has the following config:

Containers:
  hub:
    Container ID:  docker://d166f73701a1a3e46860bf8f54d969a1a3ad6d3c6225177347a649e856596bb1
    Image:         jupyterhub/k8s-hub:0.9-445a953
    Image ID:      docker-pullable://jupyterhub/k8s-hub@sha256:0d6412029ad485fc704393d6bf7cea7c29ea5d08aa1a77de5ee65173afe5cd1a
    Port:          8081/TCP
    Host Port:     0/TCP
    Command:
      jupyterhub
      --config
      /srv/jupyterhub_config.py
      --upgrade-db
    State:          Running
      Started:      Mon, 29 Jul 2019 17:47:43 -0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1250m
      memory:  1Gi
    Requests:
      cpu:     500m
      memory:  1Gi

Our node currently looks like this

Non-terminated Pods:         (15 in total)
  Namespace                  Name                                   CPU Requests  CPU Limits   Memory Requests  Memory Limits  AGE
  ---------                  ----                                   ------------  ----------   ---------------  -------------  ---
  kube-system                aws-node-qld9c                         10m (0%)      0 (0%)       0 (0%)           0 (0%)         28h
  kube-system                cluster-autoscaler-848d9c584f-nf5q4    100m (5%)     100m (5%)    300Mi (3%)       300Mi (3%)     28h
  kube-system                coredns-67cdb69b9b-4tvfb               100m (5%)     0 (0%)       70Mi (0%)        170Mi (2%)     28h
  kube-system                coredns-67cdb69b9b-94wln               100m (5%)     0 (0%)       70Mi (0%)        170Mi (2%)     28h
  kube-system                kube-proxy-jsssn                       100m (5%)     0 (0%)       0 (0%)           0 (0%)         28h
  kube-system                tiller-deploy-59d477f79-rjjbj          0 (0%)        0 (0%)       0 (0%)           0 (0%)         28h
  nasa-prod                  autohttps-5c66cbb7b-p5q5c              0 (0%)        0 (0%)       0 (0%)           0 (0%)         28h
  nasa-prod                  proxy-74869565f5-h7ght                 200m (10%)    0 (0%)       512Mi (6%)       0 (0%)         28h
  nasa-prod                  user-scheduler-6d66788464-86792        50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         28h
  nasa-prod                  user-scheduler-6d66788464-xbjwq        50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         28h
  nasa-staging               autohttps-5c7b9c9b58-6d8r7             0 (0%)        0 (0%)       0 (0%)           0 (0%)         28h
  nasa-staging               hub-696fcc7c6c-m2mm5                   500m (25%)    1250m (62%)  1Gi (13%)        1Gi (13%)      11m
  nasa-staging               proxy-56c6b94475-nccgr                 200m (10%)    0 (0%)       512Mi (6%)       0 (0%)         12h
  nasa-staging               user-scheduler-7d46c957bc-cc9rt        50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         12h
  nasa-staging               user-scheduler-7d46c957bc-d8gpk        50m (2%)      0 (0%)       256Mi (3%)       0 (0%)         12h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         1510m (75%)   1350m (67%)
  memory                      3512Mi (46%)  1664Mi (21%)
  ephemeral-storage           0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0

And the error to schedule a second hub pod is Warning FailedScheduling 13s (x6 over 4m17s) default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) didn't match node selector. cc @jhamman

Jul 30 '19 00:07 scottyhq

Looks to me like you are requesting 0.5 cpu and you have 0.49 left on this node, nominally. (I don't think it will give you exactly 100% possibly because of some overhead requirements.) Anyway, this is the behavior I would expect. Am I not understanding something?

Jul 30 '19 11:07 tjcrone

Looks to me like you are requesting 0.5 cpu and you have 0.49 left on this node, nominally. (I don't think it will give you exactly 100% possibly because of some overhead requirements.) Anyway, this is the behavior I would expect. Am I not understanding something?

The main issue is that I requested 400 cpu in common.yaml to fit it, but the pod still wants 500 cpu

So looking into this a little more, it seems related to the fact that helm upgrade can have inconsistencies if the deployment fails at some point and then other upgrades are done on top.

In brief, I think what happens it helm only sees "config - 1" and has some trouble with deployments and replicasets that define the hub resources. So if the configuration with new resource limits fails for some reason, then subsequent helm upgrades don't see any difference in the chart resource limits and don't modify the deployment. See: https://github.com/helm/helm/issues/1873#issuecomment-429460214

Here is our helm history:

149     	Sun Jul 28 15:24:22 2019	SUPERSEDED	pangeo-deploy-0.1.0	Upgrade complete                                            
150     	Mon Jul 29 05:10:41 2019	FAILED    	pangeo-deploy-0.1.0	Upgrade "nasa-staging" failed: timed out waiting for the ...
151     	Mon Jul 29 06:16:28 2019	FAILED    	pangeo-deploy-0.1.0	Upgrade "nasa-staging" failed: kind DaemonSet with the na...
152     	Mon Jul 29 17:32:45 2019	FAILED    	pangeo-deploy-0.1.0	Upgrade "nasa-staging" failed: kind DaemonSet with the na...
153     	Mon Jul 29 17:44:43 2019	SUPERSEDED	pangeo-deploy-0.1.0	Upgrade complete                                            
154     	Mon Jul 29 21:54:58 2019	SUPERSEDED	pangeo-deploy-0.1.0	Upgrade complete                                            
155     	Mon Jul 29 22:23:11 2019	DEPLOYED  	pangeo-deploy-0.1.0	Upgrade complete

So we could rollback to 149 and redeploy with a new commit to staging, or probably just slightly tweaking the resource requests and redeploying should also work (but might not be the cleanest approach).

Jul 30 '19 15:07 scottyhq

There are all sorts of helpful flags in helm upgrade, such as --reset-values, --recreate-pods, and --force to name a few that I use fairly often. I think --reset-values might solve the problem you are having. More on the various flags here. But by using hubploy in CircleCI we don't really have an easy way to pass additional flags, without modifying the source here, which is kludgy really. I wonder if @yuvipanda has thought about allowing the use of some of these flags in the hubploy code.

Jul 30 '19 16:07 tjcrone

pangeo-cloud-federation pangeo-cloud-federation copied to clipboard

hub resource requests not being over-written by config/common.yaml

pangeo-cloud-federation
pangeo-cloud-federation copied to clipboard