cortex
cortex copied to clipboard
Allow the specification of backup image registry hosts
Investigate the possibility of relying on the CRI to retry pulling docker images from different hosts (i.e. try quay and then dockerhub).
Determine which CRI is used by AmazonLinux2 AMIs. It might be helpful to ssh into the instance https://eksctl.io/usage/schema/#nodeGroups-ssh-enableSsm. Update generate_eks.py before building the cluster.
def default_nodegroup(cluster_config):
return {
"ami": "auto",
"ssh": {"enableSsm": True},
"iam": {"withAddonPolicies": {"autoScaler": True}},
"privateNetworking": cluster_config.get("subnet_visibility", "public") != "public",
"kubeletExtraConfig": {
"kubeReserved": {"cpu": "150m", "memory": "300Mi", "ephemeral-storage": "1Gi"},
"kubeReservedCgroup": "/kube-reserved",
"systemReserved": {"cpu": "150m", "memory": "300Mi", "ephemeral-storage": "1Gi"},
"evictionHard": {"memory.available": "200Mi", "nodefs.available": "5%"},
},
}
In the AWS console, select a worker instance of the cluster, click connect and you should be able to ssh into the worker using AWS SSM.
Useful links: https://github.com/containerd/containerd/blob/master/docs/cri/registry.md#configure-registry-endpoint https://kubernetes.io/docs/concepts/containers/runtime-class/
Enabling containerd is possible with the following preBootstrapCommands:
"preBootstrapCommands": [
"yum install containerd -y",
"truncate -s-1 /etc/systemd/system/kubelet.service.d/10-eksclt.al2.conf",
"echo -n ' --container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock' >> /etc/systemd/system/kubelet.service.d/10-eksclt.al2.conf",
],
Problem is that when letting eksctl do this, the CNI can no longer be initialized when the node is joining the cluster.
A ticket has been created on https://github.com/weaveworks/eksctl/issues/3572 to get feedback from the eksctl team. Messages have also been posted on the eksctl slack, but with no reply yet.
The research has been conducted on this branch https://github.com/cortexlabs/cortex/tree/reliability/containerd-cri-runtime.