amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

entry point script migration to golang

Open jayanthvn opened this issue 3 years ago • 11 comments

What type of PR is this? Enhancement

Which issue does this PR fix: N/A

What does this PR do / Why do we need it: Rewrite the entry point shell scripts in golang

This is first part of the change to update to minimal distro image.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue: N/A

Testing done on this change:

Yes

Test Cases

Startup Aws-node


dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-55brn -n kube-system
time="2021-11-02T22:12:16Z" level=info msg="Install CNI binaries.."
time="2021-11-02T22:12:16Z" level=info msg="Starting IPAM daemon ... "
time="2021-11-02T22:12:16Z" level=info msg="Checking for IPAM connectivity ... "
time="2021-11-02T22:12:17Z" level=info msg="Copying config file ... "
time="2021-11-02T22:12:17Z" level=info msg="Successfully copied CNI plugin binary and config file."

Startup Init-container

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-8qk9p -n kube-system -c aws-vpc-cni-init
Installed /host/opt/cni/bin/loopback
time="2021-11-02T22:24:23Z" level=info msg="Copying CNI plugin binaries ..."
Installed /host/opt/cni/bin/portmap
Installed /host/opt/cni/bin/bandwidth
Installed /host/opt/cni/bin/aws-cni-support.sh
time="2021-11-02T22:24:23Z" level=info msg="Copied all CNI plugin binaries to /host/opt/cni/bin\n"
time="2021-11-02T22:24:23Z" level=info msg="Found hostIP 192.168.12.6 and primaryMAC 0a:5e:a9:0e:e5:f3"
time="2021-11-02T22:24:23Z" level=info msg="Found primaryIF eth0"
time="2021-11-02T22:24:23Z" level=info msg="Updated entry for 2"
time="2021-11-02T22:24:23Z" level=info msg="Updated entry for 1"
time="2021-11-02T22:24:23Z" level=info msg="CNI init container done"

PD is Enable but WARM targets = 0

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-8nx4j -n kube-system
time="2021-11-02T21:51:30Z" level=error msg="Setting WARM_PREFIX_TARGET = 0 is not supported while WARM_IP_TARGET/MINIMUM_IP_TARGET is not set. Please configure either one of the WARM_{PREFIX/IP}_TARGET or MINIMUM_IP_TARGET env variables"

PD is enabled but Warm target = -1

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-d8vv4 -n kube-system
time="2021-11-02T21:53:19Z" level=error msg="Setting WARM_PREFIX_TARGET = 0 is not supported while WARM_IP_TARGET/MINIMUM_IP_TARGET is not set. Please configure either one of the WARM_{PREFIX/IP}_TARGET or MINIMUM_IP_TARGET env variables"

Plugin log file = stdout

dev-dsk-varavaj-2b-fdf1da64 % kubectl logs aws-node-7xtgh -n kube-system
time="2021-11-02T21:55:29Z" level=error msg="AWS_VPC_K8S_PLUGIN_LOG_FILE cannot be set to stdout"

Veth prefix > 4

kubectl logs aws-node-vb8q5 -n kube-system
time="2021-11-02T21:57:29Z" level=error msg="AWS_VPC_K8S_CNI_VETHPREFIX cannot be longer than 4 characters"

Veth prefix = lo

kubectl logs aws-node-xwjvv -n kube-system
time="2021-11-02T21:58:28Z" level=error msg="AWS_VPC_K8S_CNI_VETHPREFIX cannot be set to reserved values eth or vlan or lo"

Created aws.conf file and restarted aws-node

[root@ip-192-168-12-6 net.d]# touch aws.conf
[root@ip-192-168-12-6 net.d]# pwd
/etc/cni/net.d
[root@ip-192-168-12-6 net.d]# ls
10-aws.conflist  aws.conf

File is delete -

[root@ip-192-168-12-6 net.d]# ls
10-aws.conflist

Enable bw plugin -

cat /etc/cni/net.d/10-aws.conflist
{
  "cniVersion": "0.3.1",
  "name": "aws-cni",
  "plugins": [
    {
      "name": "aws-cni",
      "type": "aws-cni",
      "vethPrefix": "eni",
      "mtu": "9001",
      "pluginLogFile": "/host/var/log/aws-routed-eni/plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "name": "egress-v4-cni",
      "type": "egress-v4-cni",
      "ipam": {
        "type": "host-local",
        "routes": [
          {
            "dst": "0.0.0.0/0"
          }
        ],
        "dataDir": "/run/cni/v6pd/egress-v4-ipam",
        "ranges": [
          [
            {
              "subnet": "169.254.172.0/22"
            }
          ]
        ]
      },
      "enabled": "false",
      "nodeIP": "192.168.12.6",
      "mtu": "9001",
      "pluginLogFile": "/var/log/aws-routed-eni/egress-v4-plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    },
    {
      "type": "bandwidth",
      "capabilities": {
        "bandwidth": true
      }
    }
  ]
}

Disable bw plugin -

[root@ip-192-168-12-6 ec2-user]# cat /etc/cni/net.d/10-aws.conflist
{
  "cniVersion": "0.3.1",
  "name": "aws-cni",
  "plugins": [
    {
      "name": "aws-cni",
      "type": "aws-cni",
      "vethPrefix": "eni",
      "mtu": "9001",
      "pluginLogFile": "/host/var/log/aws-routed-eni/plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "name": "egress-v4-cni",
      "type": "egress-v4-cni",
      "mtu": "9001",
      "enabled": "false",
      "nodeIP": "192.168.12.6",
      "ipam": {
         "type": "host-local",
         "ranges": [[{"subnet": "169.254.172.0/22"}]],
         "routes": [{"dst": "0.0.0.0/0"}],
         "dataDir": "/run/cni/v6pd/egress-v4-ipam"
      },
      "pluginLogFile": "/var/log/aws-routed-eni/egress-v4-plugin.log",
      "pluginLogLevel": "DEBUG"
    },
    {
      "type": "portmap",
      "capabilities": {"portMappings": true},
      "snat": true
    }
  ]
}

Automation added to e2e:

Work in progress

Will this break upgrades or downgrades. Has updating a running cluster been tested?: No

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

No


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jayanthvn avatar Nov 05 '21 15:11 jayanthvn

Few high level comments before going into code details

  1. Can we build/test new distro based ARM aws-node binary on ARM instances.
  2. Can we move entry _point script changes to separate diff as this will be useful for AL2 and other images as well and not specific to minimal distro ?

srini-ram avatar Nov 15 '21 23:11 srini-ram

Few high level comments before going into code details

  1. Can we build/test new distro based ARM aws-node binary on ARM instances.
  2. Can we move entry _point script changes to separate diff as this will be useful for AL2 and other images as well and not specific to minimal distro ?

Yes Srini will test on ARM nodes.

Reg keeping the changes separate is not possible since the minimal distro image doesn't have a shell to run the entry point script and also few packages like js which is needed for the script.

jayanthvn avatar Nov 18 '21 19:11 jayanthvn

Few high level comments before going into code details

  1. Can we build/test new distro based ARM aws-node binary on ARM instances.
  2. Can we move entry _point script changes to separate diff as this will be useful for AL2 and other images as well and not specific to minimal distro ?

Yes Srini will test on ARM nodes.

Reg keeping the changes separate is not possible since the minimal distro image doesn't have a shell to run the entry point script and also few packages like js which is needed for the script.

Jayanth, I understand that script migration is a pre-req for minimal distro. But i was suggesting following order of changes

1/ Commit the script migration as a separate change so that AL2 can leverage this. [ This would require exhaustive testing and we can enhance our AL2 test suite as well] . This can potentially be bundled into CNI 10.1.2 release 2/ As a next step, we could bring in minimal distro changes and this can be bundled into CNI 10.1.3

This would help us isolate regression and unify API behavior across AL2/Minimal distro. As an example, AWS SDK attempts IMDSv2 and then falls back to IMDSv1. However, CNI 1.10.1 (AL2) entry point script doesn't do that.

srini-ram avatar Nov 18 '21 23:11 srini-ram

Sure Srini I can separate it to 2 PRs.

jayanthvn avatar Nov 19 '21 16:11 jayanthvn

@srini-ram - Updated the PR to have the AL2 image. Will open a new PR for minimal distro once this is merged.

Automation will be added as a fast follow up. Will discuss about the test plan offline.

jayanthvn avatar Nov 21 '21 01:11 jayanthvn

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar May 07 '22 00:05 github-actions[bot]

/not stale

jayanthvn avatar May 09 '22 17:05 jayanthvn

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Jul 10 '22 00:07 github-actions[bot]

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Oct 04 '22 00:10 github-actions[bot]

/not stale

jayanthvn avatar Oct 12 '22 15:10 jayanthvn

Overall content looks good to me, just a few nits. I can review further after the conflicts are resolved and jobs pass.

Quick question, any interest in pulling in go modules for cp and sysctl instead of implementing our own? I can see either side, I guess it depends on how specific this use is.

There were some 3P implementation for cp and sysctl, hence I added utils similar to upstream K8s.

jayanthvn avatar Oct 13 '22 18:10 jayanthvn

Closing this PR since including the changes in #2146

jayanthvn avatar Nov 23 '22 18:11 jayanthvn