gnomad-browser icon indicating copy to clipboard operation
gnomad-browser copied to clipboard

GKE Upgrades blocked on due to use of deprecated APIs

Open sjahl opened this issue 2 years ago • 1 comments

It looks like our GKE cluster running gnomad won't auto-upgrade to the kubernetes 1.22 release, because some resources we have deployed are using deprecated APIs:

Screen Shot 2022-07-08 at 9 33 36 AM

Still have to track down what that ingress object is, but I'm pretty sure the validating webhook is from the Elastic/ECK operator. Just looking quickly at Elastic's EOL policies, it looks like both the version of the ECK operator (1.2.1), and the version of elasticsearch itself that we're running (6.8.x) were both deprecated in January.

I think we need to assess what our compatibility with elasticsearch 7.17+ are, and plan an upgrade using a new version of the ECK operator. I don't believe that the stopped GKE upgrades are critical at the moment -- the EOL date for the GKE 1.21 is in December 2022, so I believe we have until at least then before they force an upgrade on us.

sjahl avatar Jul 08 '22 13:07 sjahl

Thanks for noticing this Steve. If EOL is this year we should definitely prioritize an ES upgrade soon (https://github.com/broadinstitute/gnomad-browser/issues/929).

mattsolo1 avatar Jul 08 '22 14:07 mattsolo1

Hi @sjahl, I'm new to gnomAD and trying to get a deployment running. I've encountered a related issue with a new deployment:

Using deployctl for a new deployment on GCP will instantiate a cluster with k8s version 1.22 (labeled as stable by GCP as of Nov 2022) which has removed the API apiextensions.k8s.io/v1beta1 in favour of apiextensions.k8s.io/v1. After that, when configuring the cluster for Elasticsearch, deployctl uses ECK version 1.2.1 which is very outdated and still relies on the deprecated apiextensions.k8s.io/v1beta1 API. ECK started supporting k8s 1.22 in version 1.7.x and they're now at 2.5.0! Also in version 1.7.0 they switched from a single all-in-one.yaml manifest to 2 separate files for custom resources (crds.yaml) and operator (operator.yaml).

Anyway, as we proceed with our deployment we'll be investigating this issue further and probably trying to patch deployctl. Maybe this can become a standalone issue? If so I'm happy to take it on and make a PR if that helps.

ammazzaw avatar Nov 11 '22 02:11 ammazzaw

Hi @ammazzaw -- we are planning to upgrade both ECK and Elasticsearch itself relatively soon; it's pretty close to the top of my priority stack right now: https://github.com/broadinstitute/gnomad-browser/issues/929. The big open question for us at the moment is whether elasticsearch 7 or 8 include any changes that break gnomAD, and we're working on a plan to test those. I'd welcome any feedback you have in that area as you get something stood up.

In the grand scheme of things, I'd like to deprecate deployctl in favor of more standard deployment tooling (e.g. Terraform and Helm/Kustomize), so I'm not placing a high priority on patching deployctl itself. But, I'm happy to review and consider any patches that you have. The only constraint is that we can't really let the deployctl script advance too far beyond the official gnomAD browser deployment.

sjahl avatar Nov 14 '22 13:11 sjahl

The below patch allows deployment with the current code base:

diff --git a/deploy/deployctl/subcommands/setup.py b/deploy/deployctl/subcommands/setup.py
index 219a25a8..94a501b7 100644
--- a/deploy/deployctl/subcommands/setup.py
+++ b/deploy/deployctl/subcommands/setup.py
@@ -210,7 +210,7 @@ def create_cluster() -> None:
             f"--zone={config.zone}",
             "--release-channel=stable",
             "--enable-autorepair",
-            "--enable-autoupgrade",
+            "--cluster-version=1.21.14-gke.3000",
             "--maintenance-window=7:00",
             f"--service-account={config.gke_service_account_full_name}",
             f"--network={config.network_name}",

peternixon avatar Nov 15 '22 05:11 peternixon

Gonna close this -- the prod gnomAD cluster has been updated to ECK 2.5.0, which should resolve the outstanding API deprecation warnings. deployctl changes are in #1039

sjahl avatar Nov 18 '22 20:11 sjahl