kops icon indicating copy to clipboard operation
kops copied to clipboard

kops 1.20 toolbox template is no longer handling env_variables redirect to --values

Open ValeriiVozniuk opened this issue 3 years ago • 44 comments

1. What kops version are you running? The command kops version, will display this information. kops v1.20.0 The issue is reproducible starting from v1.20.0-alpha.1 and up to v1.21.0-alpha.3

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. kubectl v1.21.0

3. What cloud provider are you using? aws

4. What commands did you run? What is the simplest way to reproduce this issue? kops toolbox template --name test-cluster --template cluster.tmpl.yaml --format-yaml --values <(echo ${TF_OUTPUT}) >result2.yaml Files/script to see the issue:

  1. TF_OUTPUT.json
{
  "kubernetes_cluster_name": {
    "sensitive": false,
    "type": "string",
    "value": "test-cluster"
  },
  "region": {
    "sensitive": false,
    "type": "string",
    "value": "eu-north-1"
  },
  "zone": {
    "sensitive": false,
    "type": "string",
    "value": "a"
  }
}
  1. cluster.tmpl.yaml
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-11-23T19:42:31Z
  name:  {{$.kubernetes_cluster_name.value}}
spec:
  encryptionConfig: true
  api:
    loadBalancer:
      type: Internal
  authorization:
    rbac: {}
  channel: stable
  cloudLabels:
    Service Role: k8s
    Stack: production
  cloudProvider: aws
  configBase: s3://config-bucket/v3/{{$.kubernetes_cluster_name.value}}
  dnsZone: {{$.kubernetes_cluster_name.value}}
  docker:
    storage: overlay2,overlay,aufs
    version: 19.03.4
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-{{$.region.value}}{{$.zone.value}}-1
      name: "1"
      volumeType: io1
      volumeSize: 200
      volumeIops: 10000
    - instanceGroup: master-{{$.region.value}}{{$.zone.value}}-2
      name: "2"
      volumeType: io1
      volumeSize: 200
      volumeIops: 10000
    - instanceGroup: master-{{$.region.value}}{{$.zone.value}}-3
      name: "3"
      volumeType: io1
      volumeSize: 200
      volumeIops: 10000
    name: main
    version: 3.4.3
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    featureGates:
      TTLAfterFinished: "true"
  kubeControllerManager:
    allocateNodeCIDRs: false
    featureGates:
      TTLAfterFinished: "true"
  kubeScheduler:
    featureGates:
      TTLAfterFinished: "true"
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    cloudProvider: aws
    featureGates:
      TTLAfterFinished: "true"
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.19.10
  masterPublicName: api.{{$.kubernetes_cluster_name.value}}
  1. test.sh
kops toolbox template --name test-cluster --template cluster.tmpl.yaml --format-yaml --values TF_OUTPUT.json >result1.yaml

TF_OUTPUT="{
  "kubernetes_cluster_name": {
    "sensitive": false,
    "type": "string",
    "value": "test-cluster"
  },
  "region": {
    "sensitive": false,
    "type": "string",
    "value": "eu-north-1"
  },
  "zone": {
    "sensitive": false,
    "type": "string",
    "value": "a"
  }
}"

${kops} toolbox template --name test-cluster --template cluster.tmpl.yaml --format-yaml --values <(echo ${TF_OUTPUT}) >result2.yaml

5. What happened after the commands executed?

unable to render template: cluster.tmpl.yaml, error: template: mainTemplate:5:12: executing "mainTemplate" at <$.kubernetes_cluster_name.value>: map has no entry for key "kubernetes_cluster_name"

6. What did you expect to happen? Template file processed successfully

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

See the template file in section 4

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

+ /opt/kops/v1.20.0/kops -v 20 toolbox template --name test-cluster --template cluster.tmpl.yaml --format-yaml --values /dev/fd/63
++ echo '{' kubernetes_cluster_name: '{' sensitive: false, type: string, value: test-cluster '},' region: '{' sensitive: false, type: string, value: eu-north-1 '},' zone: '{' sensitive: false, type: string, value: a '}' '}'
I0507 14:26:40.592078   13980 channel.go:105] resolving "stable" against default channel location "https://raw.githubusercontent.com/kubernetes/kops/master/channels/"
I0507 14:26:40.592313   13980 channel.go:110] Loading channel from "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable"
I0507 14:26:40.592409   13980 context.go:216] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0507 14:26:41.004025   13980 channel.go:119] Channel contents: spec:
  images:
    # We put the "legacy" version first, for kops versions that don't support versions ( < 1.5.0 )
    - name: kope.io/k8s-1.4-debian-jessie-amd64-hvm-ebs-2017-07-28
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.4.0 <1.5.0"
    - name: kope.io/k8s-1.5-debian-jessie-amd64-hvm-ebs-2018-08-17
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.5.0 <1.6.0"
    - name: kope.io/k8s-1.6-debian-jessie-amd64-hvm-ebs-2018-08-17
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.6.0 <1.7.0"
    - name: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2018-08-17
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.7.0 <1.8.0"
    - name: kope.io/k8s-1.8-debian-stretch-amd64-hvm-ebs-2018-08-17
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.8.0 <1.9.0"
    - name: kope.io/k8s-1.9-debian-stretch-amd64-hvm-ebs-2018-08-17
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.9.0 <1.10.0"
    - name: kope.io/k8s-1.10-debian-stretch-amd64-hvm-ebs-2018-08-17
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.10.0 <1.11.0"
    # Stretch is the default for 1.11 (for nvme)
    - name: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2021-02-05
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.11.0 <1.12.0"
    - name: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2021-02-05
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.12.0 <1.13.0"
    - name: kope.io/k8s-1.13-debian-stretch-amd64-hvm-ebs-2021-02-05
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.13.0 <1.14.0"
    - name: kope.io/k8s-1.14-debian-stretch-amd64-hvm-ebs-2021-02-05
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.14.0 <1.15.0"
    - name: kope.io/k8s-1.15-debian-stretch-amd64-hvm-ebs-2021-02-05
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.15.0 <1.16.0"
    - name: kope.io/k8s-1.16-debian-stretch-amd64-hvm-ebs-2021-02-05
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.16.0 <1.17.0"
    - name: kope.io/k8s-1.17-debian-stretch-amd64-hvm-ebs-2021-02-05
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.17.0 <1.18.0"
    - name: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210415
      providerID: aws
      architectureID: amd64
      kubernetesVersion: ">=1.18.0"
    - name: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-arm64-server-20210415
      providerID: aws
      architectureID: arm64
      kubernetesVersion: ">=1.20.0"
    - name: cos-cloud/cos-stable-65-10323-99-0
      providerID: gce
      architectureID: amd64
      kubernetesVersion: "<1.16.0-alpha.1"
    - name: "cos-cloud/cos-stable-77-12371-114-0"
      providerID: gce
      architectureID: amd64
      kubernetesVersion: ">=1.16.0 <1.18.0"
    - name: ubuntu-os-cloud/ubuntu-2004-focal-v20210415
      providerID: gce
      architectureID: amd64
      kubernetesVersion: ">=1.18.0"
    - name: Canonical:0001-com-ubuntu-server-focal:20_04-lts-gen2:20.04.202104150
      providerID: azure
      architectureID: amd64
      kubernetesVersion: ">=1.20.0"
  cluster:
    kubernetesVersion: v1.5.8
    networking:
      kubenet: {}
  kubernetesVersions:
  - range: ">=1.20.0"
    recommendedVersion: 1.20.6
    requiredVersion: 1.20.0
  - range: ">=1.19.0"
    recommendedVersion: 1.19.10
    requiredVersion: 1.19.0
  - range: ">=1.18.0"
    recommendedVersion: 1.18.18
    requiredVersion: 1.18.0
  - range: ">=1.17.0"
    recommendedVersion: 1.17.17
    requiredVersion: 1.17.0
  - range: ">=1.16.0"
    recommendedVersion: 1.16.15
    requiredVersion: 1.16.0
  - range: ">=1.15.0"
    recommendedVersion: 1.15.12
    requiredVersion: 1.15.0
  - range: ">=1.14.0"
    recommendedVersion: 1.14.10
    requiredVersion: 1.14.0
  - range: ">=1.13.0"
    recommendedVersion: 1.13.12
    requiredVersion: 1.13.0
  - range: ">=1.12.0"
    recommendedVersion: 1.12.10
    requiredVersion: 1.12.0
  - range: ">=1.11.0"
    recommendedVersion: 1.11.10
    requiredVersion: 1.11.0
  - range: "<1.11.0"
    recommendedVersion: 1.11.10
    requiredVersion: 1.11.10
  kopsVersions:
  - range: ">=1.20.0-alpha.1"
    recommendedVersion: "1.20.0"
    #requiredVersion: 1.20.0
    kubernetesVersion: 1.20.6
  - range: ">=1.19.0-alpha.1"
    recommendedVersion: "1.20.0"
    #requiredVersion: 1.19.0
    kubernetesVersion: 1.19.10
  - range: ">=1.18.0-alpha.1"
    recommendedVersion: "1.20.0"
    #requiredVersion: 1.18.0
    kubernetesVersion: 1.18.18
  - range: ">=1.17.0-alpha.1"
    recommendedVersion: "1.20.0"
    #requiredVersion: 1.17.0
    kubernetesVersion: 1.17.17
  - range: ">=1.16.0-alpha.1"
    recommendedVersion: "1.20.0"
    #requiredVersion: 1.16.0
    kubernetesVersion: 1.16.15
  - range: ">=1.15.0-alpha.1"
    recommendedVersion: "1.20.0"
    #requiredVersion: 1.15.0
    kubernetesVersion: 1.15.12
  - range: ">=1.14.0-alpha.1"
    #recommendedVersion: "1.14.0"
    #requiredVersion: 1.14.0
    kubernetesVersion: 1.14.10
  - range: ">=1.13.0-alpha.1"
    #recommendedVersion: "1.13.0"
    #requiredVersion: 1.13.0
    kubernetesVersion: 1.13.12
  - range: ">=1.12.0-alpha.1"
    recommendedVersion: "1.12.1"
    #requiredVersion: 1.12.0
    kubernetesVersion: 1.12.10
  - range: ">=1.11.0-alpha.1"
    recommendedVersion: "1.11.1"
    #requiredVersion: 1.11.0
    kubernetesVersion: 1.11.10
  - range: "<1.11.0-alpha.1"
    recommendedVersion: "1.11.1"
    #requiredVersion: 1.10.0
    kubernetesVersion: 1.11.10

unable to render template: cluster.tmpl.yaml, error: template: mainTemplate:5:12: executing "mainTemplate" at <$.kubernetes_cluster_name.value>: map has no entry for key "kubernetes_cluster_name"

9. Anything else do we need to know?

The issue is showing when data is redirected to --values from environment variable, if I supply the same data from local file, the template is rendered successfully. The issue is not present in kops 1.19.1/1.19.2, both ways of rendering the template are working, but is present in every 1.2x.x version starting from v1.20.0-alpha.1

ValeriiVozniuk avatar May 07 '21 11:05 ValeriiVozniuk

v1.21.0-beta.1/v1.22.0-alpha.1 are also affected

ValeriiVozniuk avatar May 11 '21 07:05 ValeriiVozniuk

kOps uses helm for its values parsing and template rendering. Based on the versions you mentioned this is likely due to the helm library being upgraded to v3. Can you update cluster.tmpl.yaml to be just {{ . }} and see if that succeeds and what the resulting file contains?

rifelpet avatar May 12 '21 02:05 rifelpet

--values takes a path to a values file. Your first run works because that is what you are doing. The second run does not as you are redirecting the content of your variable to stdin.

Older kops may have accepted values on stdin by default if this used to work, while the change to v3 may have lost that functionality.

olemarkus avatar May 12 '21 06:05 olemarkus

kOps uses helm for its values parsing and template rendering. Based on the versions you mentioned this is likely due to the helm library being upgraded to v3. Can you update cluster.tmpl.yaml to be just {{ . }} and see if that succeeds and what the resulting file contains?

Hi rifelpet, In case with files

map[clusterName:test-cluster kubernetes_cluster_name:map[sensitive:false type:string value:test-cluster] region:map[sensitive:false type:string value:eu-north-1] zone:map[sensitive:false type:string value:a]]

In case with redirect

map[clusterName:test-cluster]

On 1.19.x they match

ValeriiVozniuk avatar May 12 '21 07:05 ValeriiVozniuk

--values takes a path to a values file. Your first run works because that is what you are doing. The second run does not as you are redirecting the content of your variable to stdin.

Older kops may have accepted values on stdin by default if this used to work, while the change to v3 may have lost that functionality.

Hi olemarkus, Any chance this would be fixed? Writing several files from env variables at runtime doesn't look like a good thing to do. Yes, writing can be used as a workaround, but I think it would be better to have a proper fix.

ValeriiVozniuk avatar May 12 '21 07:05 ValeriiVozniuk

It can and should be fixed. It is not on my priority list though, but I would absolutely review a PR if anyone is willing to have a look.

/kind bug

olemarkus avatar May 14 '21 07:05 olemarkus

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 12 '21 08:08 k8s-triage-robot

/remove-lifecycle stale

ValeriiVozniuk avatar Aug 12 '21 08:08 ValeriiVozniuk

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 10 '21 08:11 k8s-triage-robot

/remove-lifecycle stale

ValeriiVozniuk avatar Nov 10 '21 08:11 ValeriiVozniuk

I'm also finding this problem. This worked up til kops 1.19 and stopped at 1.20:

kops toolbox template --values <( echo ${tf_output})

where tf_output=$(terraform output -json)

Is there a workaround? It's still present in kops 1.22.1

georgekaz avatar Nov 10 '21 13:11 georgekaz

echo ${tf_output} >tf_output.json kops toolbox template --values tf_output.json

ValeriiVozniuk avatar Nov 10 '21 13:11 ValeriiVozniuk

thank you :-) simple enough. A little annoying to have to do it that way but I suppose it's backwards compatible at least.

Do you know if this still works? kops toolbox template --values /dev/stdin ?

georgekaz avatar Nov 10 '21 13:11 georgekaz

Didn't try, so I have no idea :)

ValeriiVozniuk avatar Nov 10 '21 13:11 ValeriiVozniuk

fair enough :-)

georgekaz avatar Nov 10 '21 13:11 georgekaz

I've discovered, the answer is no, it doesn't. This also used to work and no longer does (piping in a decrypted secrets file)

sops -d secrets.enc | pv -q | kops toolbox template --values /dev/stdin

Ideally these secrets would never have to exist decrypted on disk (although they do end up in the template)

georgekaz avatar Nov 10 '21 14:11 georgekaz

kops <= 1.19 used to read the file in directly and unmarshal into yaml before merging the resulting data into the remaining values:

https://github.com/kubernetes/kops/commit/24c9d03477c2278e527f418a0546cb2111f7e924#diff-29ad7670548ff6fb5c24ab9f84e2d6d4ecd8d571a003b33561ffc4e3d8d8b6d8R245-L253

kops >= 1.20 use helm3's valueOpts.MergeValues which has special handling for stdin with -:

https://github.com/helm/helm/blob/b6a04cfbd544f0bbeea449129c1497eec8d99e2b/pkg/cli/values/options.go#L109-L111

can you try --values=- instead of --values /dev/stdin ?

rifelpet avatar Nov 11 '21 00:11 rifelpet

Not sure if the last comment was for georgekaz only, but in my case it doesn't work, giving stat -/dev/fd/63: no such file or directory error

ValeriiVozniuk avatar Nov 11 '21 09:11 ValeriiVozniuk

I get similar:

stat -: no such file or directory

georgekaz avatar Nov 11 '21 10:11 georgekaz

finding more places this is breaking my scripts. I was using kops toolbox template to prepare the EncryptionConfig file but this no longer works (this is the abridged version)

aescbc_secret=$(head -c 32 /dev/urandom | base64)

newEncryptionConfig=$( kops toolbox template \        
    --template "${encryptionConfigurationtemplate}" \                  
    --values <( echo "aescbc_secret: ${aescbc_secret}")
                          
echo "$newEncryptionConfig" | kops create secret encryptionconfig -f /dev/stdin --force;

It's all quite inconvenient. Do you think there's any likelyhood it'll be fixed or is this just how it is going to be now?

georgekaz avatar Nov 11 '21 17:11 georgekaz

For individual key value pairs, you can use --set and --set-string

newEncryptionConfig=$( kops toolbox template \        
    --template "${encryptionConfigurationtemplate}" \                  
    --set "aescbc_secret=${aescbc_secret}"

rifelpet avatar Nov 11 '21 17:11 rifelpet

thanks, that's true. I am actually using --values <( echo ${tf_output}) in this same command for pulling KMS details from Terraform, but we've already gone over that one.

georgekaz avatar Nov 11 '21 17:11 georgekaz

Is the reason for this change, to support helm templating functions in the kops cluster templates or something else?

georgekaz avatar Nov 11 '21 17:11 georgekaz

It was a long time ago but I believe the intention of the upgrade from helm2 to helm3 was to move to a supported version of the helm libraries (in case there are security fixes we need to pull in) and to take advantage of the newer templating functions available in helm3.

rifelpet avatar Nov 11 '21 23:11 rifelpet

Definitely makes sense to move to a supported version of helm, but from my experience, kops never supported helm templating functions even at helm2, it's only ever supported standard go templates. That's why I was wondering if it now supports helm functions, but https://kops.sigs.k8s.io/operations/cluster_template/ still says "The file passed as --template must be a go template."

georgekaz avatar Nov 12 '21 10:11 georgekaz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 10 '22 10:02 k8s-triage-robot

/remove-lifecycle stale

ValeriiVozniuk avatar Feb 10 '22 11:02 ValeriiVozniuk

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 11 '22 11:05 k8s-triage-robot

/remove-lifecycle stale

ValeriiVozniuk avatar May 11 '22 18:05 ValeriiVozniuk

What exactly is expected from this issue? We've already been through that "env redirect to flag" is not a thing.

olemarkus avatar May 11 '22 18:05 olemarkus