xpk icon indicating copy to clipboard operation
xpk copied to clipboard

Fix autoprovisioning with spot nodes

Open avrittrohwer opened this issue 5 months ago • 1 comments

Fixes / Features

  • Fixes workload rendering when using spot, without this change xpk workload create errors like:

    [XPK] Waiting for `Creating Workload`, for 0 seconds
    error: error parsing /tmp/tmp242uhnfs: error converting YAML to JSON: yaml: line 33: could not find expected ':'
    [XPK] Task: `Creating Workload` terminated with code `1`
    
  • Adds required pod tolerations when using node auto-provisioning with spot nodes. Without the tolerations cluster autoscaler will not create new spot node pools.

Testing / Documentation

  • [ y ] Tests pass
  • [ y, not needed ] Appropriate changes to documentation are included in the PR

Node auto-provisioning with spot

  1. Created a xpk cluster with --spot and autoprovisioning flags.
  2. Created a workload with a different topology than the cluster default.
  3. Observed a nodepool being created with the new workload topology using spot TPU nodes.

Node auto-provisioning without spot

  1. Created a xpk cluster with --spot and autoprovisioning flags.
  2. Created a workload with a different topology than the cluster default and --on-demand flag.
  3. Validated generated YAML does not specify spot node-selector and tolerations
  4. Observed a nodepool being created with the new workload topology using on-demand TPU nodes.

Not auto-provisioning with spot

  1. Created a xpk cluster with --spot flag.
  2. Validated nodepool was created with spot nodes
  3. Created a workload and validated it ran.

avrittrohwer avatar Sep 18 '24 21:09 avrittrohwer