release icon indicating copy to clipboard operation
release copied to clipboard

Draft a policy for updating Go versions across the Kubernetes codebase/infra

Open ixdy opened this issue 6 years ago • 65 comments

Kubernetes produces a new minor release every 3 months, and currently supports the 3 most recent releases. This means that a given release is supported for around 8-9 months.

Go produces a new release approximately every 6 months, and supports the 2 most recent releases. This means that a given release is supported for about a year.

The release schedules are slightly out of sync, however; the newest Go release usually occurs close to code freeze for a Kubernetes release, so we do not update until the next minor release, trailing the Go release by a few months.

This means that we often end up supporting a Kubernetes release well past the Go version it's using was supported; for example, we supported Kubernetes 1.10 through December 2018, but Go 1.9 (which it used) went out of support at the end of August 2018.

Similarly, Kubernetes 1.11 (expected to be supported through March 2019) and Kubernetes 1.12 (expected to be supported through June 2019) use Go 1.10, but this release will likely go out of support by Februrary 2019.

Historically we had some performance issues with new Go releases, which is why we generally avoided updating close to code freeze. There have also been occasional incompatible changes (particularly around gofmt), which make updates messier.

Still, we've tested the latest Go release extensively on master (and even have a release using it, Kubernetes 1.13), so we should probably update our older release branches, and perhaps make this a new policy going forward.


From @justaugustus in https://github.com/kubernetes/test-infra/pull/25355:

#25723 bumped the golang version to go1.18, so I've updated this to only include the new main variant. @kubernetes/release-engineering -- Needs /lgtm!

I missed, have we tested on -experimental at all yet? 1.18 looks like it has some exciting / major changes :D

@BenTheElder -- We created the go-canary variant in #18328 to prevent disruption for other experimental jobs when changing Golang major versions.

The experimental variant is used in enough non-SIG Release/Testing, that I've never felt comfortable with subjecting that many jobs to the change.

So right now, if someone is feeling adventurous: use go-canary. If not, they can use the master variant which doesn't get moved to "next" Golang until it lands in k/k.

That said, we don't really have a policy re: variants. @kubernetes/sig-testing-leads -- WDYT?

From @BenTheElder:

The experimental variant is used in enough non-SIG Release/Testing, that I've never felt comfortable with subjecting that many jobs to the change.

Users opting into -experimental know what they're getting into :-), we've rolled go ahead of k/k in the past here many many times.

IMHO we should be running with new go versions pretty extensively in CI before rolling them to the main repo, that's why -experimental exists (though test-infra now manages go version in repo anyhow ...). Any project choosing to use -experimental is signing up to be experimented on, if that's not suitable for their usage, they need to use another image.

KRTE in particular is more explicitly not even supported for other purposes than testing kubernetes with kind, for this reason (to avoid other usage blocking changes), but "experimental" in the tag should be a pretty good hint + past behavior of regular go upgrades.

I don't think go-canary is used widely enough, and the point of -experimental vs "go-canary" is that it might not only be go that we need to ensure is safe to upgrade, the same CI jobs may also be used to trial e.g. docker upgrades, it's wasteful to have jobs running only intended to be used for go upgrade testing.

At different points in time we have different things to update in the CI environment that should receive the same level of soaking through some e2e canary jobs etc before rolling to main/master / ...

I'm not sure what happened where this go round, but the go1.18 update was extremely not smooth, it doesn't feel like it got soaked in CI ahead of updating.

From @liggitt:

if the intent of this is just to enable master/main kubekins tags to be identical, this seems harmless.

I agree we should stop bumping master (and now main) images to a new version of go before demonstrating green runs on the go-canary / experimental image, but that seems outside the scope of this PR.

@justaugustus, is there an issue tracking the fallout of the last CI go bump to 1.18 and how we'll change the rollout for go 1.19?

ixdy avatar Jan 24 '19 19:01 ixdy

/sig release

ixdy avatar Jan 24 '19 19:01 ixdy

cc @dims @cblecker

FWIW, I support upgrading. It will be a bit of work, but it will mean we continue to use supported go going forward.

The gofmt incompatibility is only development time, and other changes should be safe and compatible, and we've already hopefully verified scaling issues with the newer go.

BenTheElder avatar Jan 24 '19 19:01 BenTheElder

This issue was at least partially motivated by the recent Go security releases.

I think in the short term we should make new patch releases of k8s 1.11 and 1.12 with go1.10.8, but we should aim to move those to go1.11.x before go1.10 goes out of support next month.

ixdy avatar Jan 24 '19 19:01 ixdy

@BenTheElder @ixdy - i'd support this new policy

cc @kubernetes/sig-release

dims avatar Jan 24 '19 19:01 dims

@BenTheElder @ixdy we should get buy-in from the current patch release managers too

dims avatar Jan 24 '19 19:01 dims

cc @foxish @feiskyer @aleksandra-malinowska, @tpepper @kubernetes/product-security-team

ixdy avatar Jan 24 '19 20:01 ixdy

I'm supportive of this policy too, but it will mean more work maintaining our release branches, especially with things like rules_go/bazel.

cblecker avatar Jan 25 '19 00:01 cblecker

if we don't want to maintain compatibility with the latest bazel + rules_go (historically we've generally kept those locked), we can always override the go version. We're already doing this on release-1.11: https://github.com/kubernetes/kubernetes/blob/3a10094374f22b621b3379a2f6cb81025ffe543f/build/root/WORKSPACE#L50-L67

ixdy avatar Jan 25 '19 01:01 ixdy

+1.

if we don't want to maintain compatibility with the latest bazel + rules_go (historically we've generally kept those locked), we can always override the go version.

This seems easier to do upgrading. But with this, will there some build issues because of override?

feiskyer avatar Jan 25 '19 09:01 feiskyer

Seems like we must at least trial moving our old branches forward to golang 1.11.x over the next weeks then (ie: see if scale test especially are okay). In the meantime need to release updated k8s builds with the golang patch fix release of the versions we're current using.

tpepper avatar Jan 25 '19 16:01 tpepper

But with this, will there some build issues because of override?

There shouldn't be. rules_go is pretty agnostic about Go versions.

ixdy avatar Jan 25 '19 19:01 ixdy

One catch: go1.10 -> go1.11 had a lot of breaking changes (fmt and vet), which makes the update harder than just "update the version".

ixdy avatar Feb 22 '19 19:02 ixdy

/priority important-soon No milestone, AFAICT this isn't directly relevant to v1.14 but instead all previous releases

spiffxp avatar Feb 24 '19 02:02 spiffxp

Go 1.12 is out now, which means that Kubernetes 1.11 and 1.12 are both now using an unsupported version of Go (1.10).

ixdy avatar Feb 26 '19 19:02 ixdy

One catch: go1.10 -> go1.11 had a lot of breaking changes (fmt and vet), which makes the update harder than just "update the version".

bumping go versions can cut both ways. as an example, https://github.com/golang/go/issues/27044 caused bottleneck issues for clients that needed to open more than a few hundred watches (aggregator, kubelet).

That issue is resolved in go1.12, which could be good to pick up, but the opposite would have happened when bumping to the version of go that first included https://github.com/golang/go/issues/27044.

liggitt avatar Feb 26 '19 20:02 liggitt

hm, so it sounds like go1.11 had some pretty serious regressions for Kubernetes? what should we do, then?

ixdy avatar Feb 26 '19 22:02 ixdy

if we want to do this we should it ASAP! we still have a week and half away to code freeze. we will be slammed with PRs the closer we get to code freeze. bisect(s) are going to get harder with volume if we run into regressions.

dims avatar Feb 26 '19 22:02 dims

Speaking about this in SIG Release, I have no strong opinion at present on the version of go used for previous releases.

I am willing to say yes if someone wants to try landing 1.12 support in master prior to Friday, otherwise I would rather punt this to the next release.

spiffxp avatar Feb 26 '19 22:02 spiffxp

I originally created this issue only to discuss updating the version of Go in older Kubernetes releases (notably k8s 1.11 and 1.12).

ixdy avatar Feb 26 '19 22:02 ixdy

I'm going to put on my RelEng lead hat and say, that at least for me, we're outside of the window where we should attempt this for 1.14. I think in general we need a pipeline that encodes the criteria we would use to evaluate the correctness of a go version bump. I would like to see such tooling applied to previously released versions of Kubernetes so we test again on lower risk targets. Let's perhaps work on some concrete requirements, in collaboration with SIGs like Scalability, and see some demos of how validation could work.

/cc @hoegaarden

calebamiles avatar Feb 26 '19 22:02 calebamiles

Go 1.12 is out now, which means that Kubernetes 1.11 and 1.12 are both now using an unsupported version of Go (1.10).

This is the part that concerns me the most. Is it better to set ourselves up for needing to scramble to update and validate if (when?) go security releases come out, or to get ourselves on versions that will be supported for the duration of the k8s release's support life?

liggitt avatar Feb 26 '19 23:02 liggitt

Open PR to update master to 1.12: https://github.com/kubernetes/kubernetes/pull/74632

Dunno if this is a 1.14.0 thing, or if we should leave it for later.. but at least we can get CI rolling and see how it performs.

cblecker avatar Feb 27 '19 00:02 cblecker

Please note that such scrambling still goes through some maintainers right now such as @ixdy with pretty low remaining bandwidth (AFAICT anyhow...) to push the necessary build images etc. IMHO it's preferable to have a plan for upgrading instead of scrambling at the last minute to figure out how to get patched go.

BenTheElder avatar Feb 27 '19 00:02 BenTheElder

The kubernetes 1.12 branch is now using an unsupported version of go (1.10.8). Have we come to a consensus on what we want to do about this? (Kubernetes 1.11 is also affected, but I'm not sure if we're planning any more support of that branch given that 1.14 has been released.)

ixdy avatar Mar 28 '19 17:03 ixdy

I'm ok of upgrading to golang 1.12 for release-1.12. @cblecker @ixdy Would you like to validate and land it?

feiskyer avatar Mar 29 '19 01:03 feiskyer

there is a latency regression and proxy flushing regression we need to resolve before we transition any other release branches to go1.12

liggitt avatar Mar 29 '19 01:03 liggitt

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jun 27 '19 02:06 fejta-bot

/remove-lifecycle stale /milestone v1.16

justaugustus avatar Jun 27 '19 02:06 justaugustus

there is a latency regression and proxy flushing regression we need to resolve before we transition any other release branches to go1.12

This has been fixed in the meantime - flush was fixed in 1.12.1 (or 2) IIRC, and performance regression was fixed in 1.12.5 - so we should be safe.

[BTW, we're also working with golang team now to validate upcoming 1.13 release, so that we won't be surprised again (at least from performance POV)].

wojtek-t avatar Jun 27 '19 06:06 wojtek-t

@justaugustus @wojtek-t @feiskyer @ixdy - are we there yet? :) (what's left to be done)

dims avatar Jul 13 '19 01:07 dims