sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

[sysbox-deploy-k8s] Authentication issue preventing ECR utilization in EKS clusters

Open rodnymolina opened this issue 3 years ago • 3 comments

In K8s scenarios where private registries are utilized to access container images, there's usually a need to authenticate the user through the mechanisms put in place by the cloud vendor of choice. This is typically the case In EKS clusters created through the eksctl tool, where kubelet is configured to fetch the pause image from a local Elastic-Container-Registry (ECR).

As part of sysbox-deploy-k8s' daemonset execution, there's logic to extract these configuration elements from kubelet and set them accordingly in cri-o's configuration (right, there's some overlap between cri-o and kubelet in regards to a few config attribs such as 'pause-image', 'pause-image-file-authentication', etc).

Problem with this approach is that we need to configure cri-o to be able to authenticate against ECR servers, and for that to happen we need to rely on AWS CLI tool, which isn't at hand when running within the context of sysbox-deploy-k8's daemonset. In consequence, no pod is able to initialize as can be seen below:

May 09 06:54:41 ip-192-168-14-235 crio[125942]: time="2022-05-09 06:54:41.510707022Z" level=info msg="RunSandbox: releasing container name: k8s_POD_aws-node-5lk92_kube-system_c1678921-5af4-47a0-9632-1e840836e172_0" id=0ec41843-885d-4e91-b635-241d8626fc2c name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
May 09 06:54:41 ip-192-168-14-235 crio[125942]: time="2022-05-09 06:54:41.510780796Z" level=info msg="RunSandbox: releasing container name: k8s_POD_aws-node-5lk92_kube-system_c1678921-5af4-47a0-9632-1e840836e172_0" id=0ec41843-885d-4e91-b635-241d8626fc2c name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
May 09 06:54:41 ip-192-168-14-235 kubelet-eks.daemon[126005]: E0509 06:54:41.511103  126005 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = error creating pod sandbox with name \"k8s_aws-node-5lk92_kube-system_c1678921-5af4-47a0-9632-1e840836e172_0\": Error initializing source docker://602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.1-eksbuild.1: Error reading manifest 3.1-eksbuild.1 in 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause: unauthorized: authentication required"

Notice that kubelet is properly configured to make use of cri-o as its CRI, and the pause attributes are also defined as expected. However, there's some extra tweaking required to allow ECR authentication, probably through the utilization of the pause_image_auth_file crio config attribute -- the pending question here is how to populate that file with the limited resources that are available when running within the context of sysbox-deploy-k8s daemonset?

ubuntu@ip-192-168-14-235:~$ sudo snap get kubelet-eks
Key                           Value
address                       0.0.0.0
anonymous-auth                false
args                          --node-labels=alpha.eksctl.io/nodegroup-name=ubuntu-nodes,alpha.eksctl.io/cluster-name=my-cluster,node-lifecycle=on-demand,alpha.eksctl.io/instance-id=i-06305353ab3dff91c
authentication-token-webhook  true
authorization-mode            Webhook
cgroup-driver                 cgroupfs
client-ca-file                /etc/kubernetes/pki/ca.crt
cloud-provider                aws
cluster-dns                   10.100.0.10
cluster-domain                cluster.local
cni-bin-dir                   /opt/cni/bin
cni-conf-dir                  /etc/cni/net.d
config                        /etc/kubernetes/kubelet/kubelet-config.json
container-runtime             remote
container-runtime-endpoint    unix:///var/run/crio/crio.sock
feature-gates                 RotateKubeletServerCertificate=true
kubeconfig                    /var/lib/kubelet/kubeconfig
max-pods                      29
network-plugin                cni
node-ip                       192.168.14.235
pod-infra-container-image     602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.1-eksbuild.1
register-node                 true
resolv-conf                   /run/systemd/resolve/resolv.conf

rodnymolina avatar May 17 '22 22:05 rodnymolina

I believe this issue addresses a similar problem: https://github.com/cri-o/cri-o/issues/2614

rodnymolina avatar May 18 '22 03:05 rodnymolina

Is there a suggested workaround until this can be handled automatically? I'd like to install using the sysbox-deploy-k8s' daemonset, but I don't see a way to set pause_image_auth_file with the current installer.

I was thinking that I could patch the crio.conf at the installer runs, but since this installer restarts the kubelet and all containers there is not a viable spot to insert this patch. I don't see a good way to either change the pause_image attr or set the pause_image_auth_file in the crio.conf.

joeljeske avatar Nov 10 '22 16:11 joeljeske

@joeljeske, sorry for the delay.

While it's true that we are not currently handling the presence of the pause_image_auth_file attribute during Sysbox's installation, which is probably why we ran into this issue in the first place, we do support custom settings of the pause_image attribute.

That's to say that if you have a custom pause_image setting in your kubelet config, and then you attempt to install Sysbox, then you should see the original pause_image setting being honored and its proper value automatically set in crio.conf. Please let us know if that doesn't work in your setup.

rodnymolina avatar Nov 14 '22 03:11 rodnymolina