cloud-provider icon indicating copy to clipboard operation
cloud-provider copied to clipboard

Standardize the Cloud Controller Manager Build/Release Process

Open andrewsykim opened this issue 6 years ago • 18 comments

Right now each provider is building/releasing the external cloud controller manager in their own way. It might be beneficial to standardize this going forward or at least set some guidelines on what is expected from a cloud controller manager build/release.

Some questions to consider:

  • What should a CCM release include? Docker image? Binaries? Source Code?
  • What base images are acceptable for a CCM build? Does it even matter?

We've had this discussion multiple times at KubeCONs and SIG calls, would be great to get some of those ideas vocalized here and formalize this in a doc going forward.

cc @cheftako @jagosan @hogepodge @frapposelli @yastij @dims @justaugustus

andrewsykim avatar Jun 18 '19 15:06 andrewsykim

First thing to sort out is how to update the modules, so that Go module updates work correctly.

The standard main.go has dependencies on 'k8s.io/kubernetes' and 'k8s.io/component-base'

Component base isn't semantically versioned properly, and fetching the main kubernetes module causes a load of version failures as the staging redirect 'replace' entries in the go module file don't apply in an external structure.

NeilW avatar Jun 20 '19 11:06 NeilW

$ go get k8s.io/[email protected]
go: finding k8s.io/apiextensions-apiserver v0.0.0
go: finding k8s.io/apiserver v0.0.0
go: finding k8s.io/kube-proxy v0.0.0
go: finding k8s.io/cloud-provider v0.0.0
go: finding k8s.io/kube-scheduler v0.0.0
go: finding k8s.io/cluster-bootstrap v0.0.0
go: finding k8s.io/csi-translation-lib v0.0.0
go: finding k8s.io/client-go v0.0.0
go: finding k8s.io/kubelet v0.0.0
go: finding k8s.io/sample-apiserver v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: finding k8s.io/apimachinery v0.0.0
go: finding k8s.io/kube-controller-manager v0.0.0
go: finding k8s.io/kube-aggregator v0.0.0
go: finding k8s.io/metrics v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: finding k8s.io/code-generator v0.0.0
go: finding k8s.io/cri-api v0.0.0
go: finding k8s.io/legacy-cloud-providers v0.0.0
go: finding k8s.io/component-base v0.0.0
go: finding k8s.io/cli-runtime v0.0.0
go: finding k8s.io/api v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: k8s.io/[email protected]: unknown revision v0.0.0
go: error loading module requirements

NeilW avatar Jun 20 '19 11:06 NeilW

Thanks @NeilW! I agree that removing imports to k8s.io/kubernetes will help the case here. There were some discussions in the past to move k8s.io/kubernetes/cmd/cloud-controller-manager to either k8s.io/cloud-provider/cmd/cloud-controller-manager or k8s.io/cloud-controller-manager. The tricky thing with that is now all cloud-specific controllers also need to move to an external repo now since you can't import k8s.io/kubernetes from a staging repo. Would love your thoughts here on what would be ideal for your provider. cc @timoreimann for feedback from digitalocean

re: k8s.io/component-base not being semantically versioned, can you open an issue in kubernetes/kubernetes for that?

andrewsykim avatar Jun 20 '19 14:06 andrewsykim

I've spent a day struggling with 1.15 and I've still not managed to get the dependencies sorted out for the cloud-provider. It looks like I'll have to manually code 'replace' entries for all the repos in the 'staging' area of the kubernetes repo. So we definitely have a problem.

However that does open up a possibility for making cloud-providers more standard. If you built a dummy provider that responded to end to end tests, and published in a standard way, but didn't actually do anything, then you could 'replace' that provider's interface repo path with a path to a provider's repo that implements the same interface.

That allows you to simply replicate the standard repo as say 'brightbox-cloud-provider' and just change the 'replace' entry in the 'go.mod' to point to say 'brightbox/brightbox-cloud-provider-interface'. Then you can follow the same automated integration testing and deployment/publishing process as the standard dummy provider.

And on the interface repo that people like me maintain, we can run unit tests and set up the dependencies with our own 'go.mod' completely decoupled from the cloud-provider 'shell' the interface will be compiled into.

NeilW avatar Jun 21 '19 08:06 NeilW

In terms of a publishing process, the one I use with Hashicorp to publish our terraform provider is a good one. I go on a slack channel and ask them to roll a new release, and after a few manual checks the maintainer of the central repo holding the providers hits the go button on the automated release system.

Now Hashicorp have staff managing that central provider repo(https://github.com/terraform-providers), and that may not work with k8s given the nature of the project. But it's something to consider.

NeilW avatar Jun 21 '19 08:06 NeilW

I haven't upgraded DigitalOcean's CCM to 1.15 yet, but I do remember that moving to the 1.14 deps was quite a hassle. For instance, it required adding a replace directive for apimachinery which wasn't obvious for me to spot.

I noticed that the latest client-go v1.12 (corresponding to Kubernetes 1.15 as it seems) encodes these replace directives in its go.mod file now. My guess is that, if cloud-provider followed the same pattern of accurately pinning down dependencies per each release through Go modules, consumption of cloud-provider should become easier.

@NeilW's idea of providing a dummy provider is interesting, though I'm not sure I fully grasped yet how that'd be consumed. In general, I'd definitely appreciate a sample provider that described the canonical way of setting up a custom cloud provider; last time I went over some of the available implementations from the different clouds, they all had slight variations, which could easily have been the case because their development cycles can't possibly be synchronized perfectly; or maybe there are legitimate reasons to have divergent setups?

It'd be great to have a "source of truth" that outlines one or more recommended setups (similar to client-go's sample directory).

timoreimann avatar Jun 21 '19 09:06 timoreimann

@andrewsykim

There were some discussions in the past to move k8s.io/kubernetes/cmd/cloud-controller-manager to either k8s.io/cloud-provider/cmd/cloud-controller-manager or k8s.io/cloud-controller-manager.

I'm all in favor of removing any dependencies on k8s.io/kubernetes that are currently still in cloud provider since those tend to pull in a fair number of transitive packages (which are presumably not all required?).

What's the benefit of moving the cloud provider command part into a new, separate repository? My gut feeling is that it would be easier to reuse the existing k8s.io/cloud-provider repository we have today. Is there any prior discussion available to possibly gain more context around the various pro's and con's?

timoreimann avatar Jun 21 '19 09:06 timoreimann

@NeilW's idea of providing a dummy provider is interesting, though I'm not sure I fully grasped yet how that'd be consumed.

Less that we would consume cloud-provider and more that it would consume us.

  1. Copy cloud-provider to a new repo digitalocean-cloud-provider within a k8s organisation that holds and publishes the cloud-providers.
  2. Alter the go.mod and add a replace that says k8s.io/cloud-provider-interface => github.com/digitalocean/digitalocean-cloud-provider-interface vX.Y.Z
  3. Run the release process on that repo, which compiles, builds and tests the cloud-provider then publishes the container somewhere central.

We then just build our provider interface libraries to the published Go Interface.

NeilW avatar Jun 21 '19 09:06 NeilW

In terms of updating to 1.15

  • Strip your go.mod down to just the requires for your provider
  • Auto generate the k8s.io require and replace entries using something like this

Hope that saves somebody a lot of time.

NeilW avatar Jun 21 '19 12:06 NeilW

/assign @yastij

andrewsykim avatar Jul 10 '19 20:07 andrewsykim

For v1.16: consensus on what the build/release process for CCM should look like.

andrewsykim avatar Jul 10 '19 21:07 andrewsykim

A couple of things:

  • should we rely on prow + the fact that the release machinery should be oss for the ccms hosted under k/k ? I would say yes. This would let users know how the artefacts they're using are built.

  • I think that the process + outputs should be transparent as much as possible

  • As for base image, I think we should follow what we're doing upstream (i.e. images should be based on distroless)

also I think we should start publishing binaries stripped from in-tree cloud-providers, this would help to drive adoption. cc @kubernetes/release-engineering

yastij avatar Jul 12 '19 12:07 yastij

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Dec 31 '19 22:12 fejta-bot

/remove-lifecycle stale

cheftako avatar Jan 02 '20 23:01 cheftako

/lifecycle frozen

cheftako avatar Jan 02 '20 23:01 cheftako

/help

andrewsykim avatar Apr 15 '20 20:04 andrewsykim

@andrewsykim: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 15 '20 20:04 k8s-ci-robot

@cheftako to put together short proposal for v1.19

andrewsykim avatar Apr 15 '20 20:04 andrewsykim