kops icon indicating copy to clipboard operation
kops copied to clipboard

etcd-manager-slim built with ko

Open hakman opened this issue 6 months ago • 4 comments

/cc @rifelpet @justinsb

hakman avatar Jun 22 '25 06:06 hakman

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot avatar Jun 22 '25 06:06 k8s-ci-robot

/test pull-kops-e2e-k8s-aws-calico

hakman avatar Jun 22 '25 06:06 hakman

pull-kops-e2e-k8s-aws-calico succeeded using the new etcd-manager-slim image built with ko. Minor adjustments are needed, as seen in this POC. The only missing piece is etcd-manager-ctl.

hakman avatar Jun 22 '25 12:06 hakman

/test pull-kops-e2e-k8s-aws-calico /test pull-kops-e2e-k8s-gce-cilium

hakman avatar Jun 29 '25 04:06 hakman

/test pull-kops-e2e-k8s-aws-calico /test pull-kops-e2e-k8s-gce-cilium

hakman avatar Jun 29 '25 04:06 hakman

@rifelpet @justinsb This should be ready for review now.

hakman avatar Jun 29 '25 11:06 hakman

/test pull-kops-e2e-k8s-aws-amazonvpc

hakman avatar Jun 29 '25 12:06 hakman

I'm surprised with the bazel-less builds there is a significant increase in image size. We also lose some reproducibility based on some timestamps changing:

registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20250629     e44fd24eed1b   2 weeks ago     232MB
registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20241012     44e2eaa3e9ae   55 years ago    152MB

rifelpet avatar Jun 29 '25 13:06 rifelpet

I'm surprised with the bazel-less builds there is a significant increase in image size. We also lose some reproducibility based on some timestamps changing:

registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20250629     e44fd24eed1b   2 weeks ago     232MB
registry.k8s.io/etcd-manager/etcd-manager-slim     v3.0.20241012     44e2eaa3e9ae   55 years ago    152MB

Bazel builds were based on distroless, now the base is debian12-slim. Also, binaries are a little bigger now in v3.0.20250629 vs v3.0.20250628.

hakman avatar Jun 29 '25 13:06 hakman

@rifelpet Any idea about the failure in amazonvpc CNI test?

I0629 13:19:57.131784 29505 middleware.go:44] AWS request: EC2 DescribeVolumes W0629 13:19:57.135973 29505 boot.go:44] error during attempt to bootstrap (will sleep and retry): unable to attach master volumes: error querying for EC2 volumes: operation error EC2: DescribeVolumes, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://ec2.eu-west-1.amazonaws.com/": tls: failed to verify certificate: x509: certificate signed by unknown authority

hakman avatar Jun 29 '25 13:06 hakman

Looks like RHEL/AL2023 use different paths for certs: https://github.com/golang/go/blob/go1.24.4/src/crypto/x509/root_linux.go#L19-L23

// Possible directories with certificate files; all will be read.
var certDirectories = []string{
        "/etc/ssl/certs",     // SLES10/SLES11, https://golang.org/issue/12139
        "/etc/pki/tls/certs", // Fedora/RHEL
}

hakman avatar Jun 29 '25 14:06 hakman

/test pull-kops-e2e-k8s-aws-amazonvpc

hakman avatar Jun 29 '25 14:06 hakman

Looks like RHEL/AL2023 use different paths for certs: https://github.com/golang/go/blob/go1.24.4/src/crypto/x509/root_linux.go#L19-L23

One thing we could do is use bind mounting to "fix" this, making sure that the path is always in either a standard place we define, or alternatively bind mounting to e.g. /etc/kubernetes/etcd-manager/ca-certs/ so that we can do per-pod configuration. That way we could have a static manifest, which I personally think is easier to follow. It's overkill for this use case, because etcd-manager comes in so early though, so I think we probably should not do this in this PR (it's also a separable idea, so I think we should separate it into another PR if we do choose to pursue it).

Do you think this PR is ready to merge? It is looking good...

justinsb avatar Jun 29 '25 16:06 justinsb

@justinsb I think for sure it can be improved, but we should merge it and get some testgrid feedback.

hakman avatar Jun 29 '25 16:06 hakman

Thanks @hakman

/approve /lgtm

justinsb avatar Jun 29 '25 16:06 justinsb

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jun 29 '25 16:06 k8s-ci-robot