datadog-operator icon indicating copy to clipboard operation
datadog-operator copied to clipboard

policy/v1beta1 no longer available in K8S 1.25

Open andysnowden opened this issue 3 years ago • 3 comments

Output of the info page (if this is a bug)

{"level":"ERROR","ts":"2022-09-28T19:38:58Z","logger":"setup","msg":"Problem running manager","error":"failed to wait for datadogagent caches to sync: no matches for kind \"PodDisruptionBudget\" in version \"policy/v1beta1\""}

Describe what happened: The operator is failing due to a deprecated API version.

Describe what you expected: The operator to start

Steps to reproduce the issue: Helm install operator on a 1.25.x cluster

Additional environment details (Operating System, Cloud provider, etc): KOPS deployed cluster on AWS

This appears to be the only non v1 used in the code base. The v1 has been available since 1.21, and that version of K8S is already marked as EOL. It does mean limiting the backward support of the operator to some extent.

andysnowden avatar Sep 29 '22 12:09 andysnowden

Hi @andysnowden , thanks for reporting - it's on our radar and we'll share an update here when it's addressed!

celenechang avatar Sep 29 '22 13:09 celenechang

Great to hear @celenechang

Is there a rough ETA or potential workaround until then?

andysnowden avatar Sep 29 '22 18:09 andysnowden

Looks like this blocks me from installing operator on my cluster. If anyone know a way to temporary patch or a workaround would be much appreciated. Thanks.

Xosmond avatar Oct 09 '22 17:10 Xosmond

Any update on this topic?

diegoparrilla avatar Dec 15 '22 15:12 diegoparrilla

This is blocking us as well, any ETA?

renchap avatar Jan 02 '23 10:01 renchap

Hello, we merged two changes #683, #688 addressing this issue, they will be included in the next v1.0.0 RC.

levan-m avatar Jan 19 '23 20:01 levan-m

This is blocking us as well - looking forward to the release.

dlorent avatar Jan 20 '23 07:01 dlorent

also blocked here - any updates?

joe-carpenter avatar Feb 21 '23 11:02 joe-carpenter

The change is available in v1.0.0-RC7 and later. We haven't backported it to v0.8 yet, since we are prioritizing v1.0 GA. It's in our backlog and will post an ETA when we get to it.

As a general ask - please, indicate the version you are using when reporting an issue.

levan-m avatar Feb 21 '23 15:02 levan-m

@levan-m the problem is that we can't even try v1.0.0-RC8 since there isn't any helm release for it. If you could release v1.0.0-RC8 as a helm chart and I think the majority of us will be happy to give it a try just so we can upgrade to 1.25. You would also get a win from this by getting more people to try your RC before 1.0.0 is out. If you can prioritize this it would be great.

Or of course backport to an old version.

nissessenap avatar Feb 22 '23 08:02 nissessenap

There is not (and won't be) a new chart for 1.0. Existing Operator chart can be configured to install a specific version of Operator. It's now set to 0.8.4 here and will be updated to 1.0 once it's generally available.

Please keep in mind, when installing Operator 1.0 with above chart you need to set CRD version explicitly to v2alpha1 using datadog-crds.migration.datadogAgents.version. Please, check issue 689, specifically this comment about 1.0 setup.

levan-m avatar Feb 22 '23 14:02 levan-m

@levan-m is there a timeline for when 1.0 will be released? The current version of the operator does not support Kubernetes 1.25 due to it using an old API version for PodDisruptionBudgets. This is currently a blocker to upgrade Kuberentes version.

phillebaba avatar Mar 10 '23 13:03 phillebaba

Hello,

First, thank you for using the Operator and providing feedback and questions! We are still working towards v1.0 GA, with mid-April as target date.


@phillebaba PDB and PSP compatibility issue with 1.25 is addressed in v1.0 since RC7. To install v1.0 via Helm chart use below command helm install datadog-operator datadog/datadog-operator --set image.tag=1.0.0-rc.10 --set datadog-crds.migration.datadogAgents.version=v2alpha1

This comment explains why these two parameters are needed.

EDIT: please make sure you are using Operator chart 0.9.0 or later, when running above command.

EDIT: You have to use v2alpha1 version of DatadogAgent.


Regarding my earlier comment

We haven't backported it to v0.8 yet, since we are prioritizing v1.0 GA. It's in our backlog and will post an ETA when we get to it.

As we have a target date for GA, we are committing most of our resources to it and we won't backport the change to v0.8 for now. We will review our backlog post-GA and consider backporting if there is sufficient interest.

levan-m avatar Mar 13 '23 17:03 levan-m

@levan-m The workaround throws new errors and the operator still cannot start:

{"level":"ERROR","ts":"2023-03-19T07:35:53Z","logger":"klog","msg":"Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)\ngoroutine 666 [running]:\nk8s.io/apimachinery/pkg/util/runtime.logPanic({0x195e040?, 0x2ce1c60})\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0x86\nk8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x73?})\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x75\npanic({0x195e040, 0x2ce1c60})\n\t/usr/local/go/src/runtime/panic.go:838 +0x207\ngithub.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).getCredentialsV2(0xc0003acc30?, 0xc0003acb00)\n\t/workspace/pkg/controller/utils/datadog/metrics_forwarder.go:611 +0x69\ngithub.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).setupV2(0xc000437dc0)\n\t/workspace/pkg/controller/utils/datadog/metrics_forwarder.go:261 +0x24f\ngithub.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).connectToDatadogAPI(0xc000437dc0)\n\t/workspace/pkg/controller/utils/datadog/metrics_forwarder.go:303 +0x58\nk8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x18, 0xc000680800})\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:220 +0x1b\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1e887b0?, 0xc000838680?}, 0xc0001d5dd0?)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:233 +0x57\nk8s.io/apimachinery/pkg/util/wait.poll({0x1e887b0, 0xc000838680}, 0x10?, 0xf4a905?, 0xc0009fe570?)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:580 +0x38\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext({0x1e887b0, 0xc000838680}, 0x10?, 0xc000680800?)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:545 +0x49\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntil(0xc000437dc0?, 0xc000a42480?, 0x0?)\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:536 +0x7c\ngithub.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).start(0xc000437dc0, 0x0?)\n\t/workspace/pkg/controller/utils/datadog/metrics_forwarder.go:165 +0x10c\ncreated by github.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*ForwardersManager).Register\n\t/workspace/pkg/controller/utils/datadog/forwarders_manager.go:69 +0x365\n"} 29 panic: runtime error: invalid memory address or nil pointer dereference [recovered] 28 panic: runtime error: invalid memory address or nil pointer dereference 27 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x13e5029] 26 25 goroutine 666 [running]: 24 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x73?}) 23 /go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0xd8 22 panic({0x195e040, 0x2ce1c60}) 21 /usr/local/go/src/runtime/panic.go:838 +0x207 20 github.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).getCredentialsV2(0xc0003acc30?, 0xc0003acb00) 19 /workspace/pkg/controller/utils/datadog/metrics_forwarder.go:611 +0x69 18 github.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).setupV2(0xc000437dc0) 17 /workspace/pkg/controller/utils/datadog/metrics_forwarder.go:261 +0x24f 16 github.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).connectToDatadogAPI(0xc000437dc0) 15 /workspace/pkg/controller/utils/datadog/metrics_forwarder.go:303 +0x58 14 k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x18, 0xc000680800}) 13 /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:220 +0x1b 12 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1e887b0?, 0xc000838680?}, 0xc0001d5dd0?) 11 /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:233 +0x57 10 k8s.io/apimachinery/pkg/util/wait.poll({0x1e887b0, 0xc000838680}, 0x10?, 0xf4a905?, 0xc0009fe570?) 9 /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:580 +0x38 8 k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext({0x1e887b0, 0xc000838680}, 0x10?, 0xc000680800?) 7 /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:545 +0x49 6 k8s.io/apimachinery/pkg/util/wait.PollImmediateUntil(0xc000437dc0?, 0xc000a42480?, 0x0?) 5 /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:536 +0x7c 4 github.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*metricsForwarder).start(0xc000437dc0, 0x0?) 3 /workspace/pkg/controller/utils/datadog/metrics_forwarder.go:165 +0x10c 2 created by github.com/DataDog/datadog-operator/pkg/controller/utils/datadog.(*ForwardersManager).Register 1 /workspace/pkg/controller/utils/datadog/forwarders_manager.go:69 +0x365

BrokenFlame avatar Mar 19 '23 07:03 BrokenFlame

We're running into the above panic as well, any workaround @levan-m? Thanks for your work on this.

mikesplain avatar Mar 20 '23 13:03 mikesplain

@BrokenFlame, @mikesplain which version of DatadogAgent manifest are you applying after installing Operator 1.0 with the above command?

When you install Operator with the above command, you have to use v2alpha1 API version. Operator 1.0 doesn't have conversion webhook enabled out of the box, so if you apply v1alpha1 it will be serialized into v2alpha1 and some of the fields dropped. Webhook setup and corresponding documentation will be released with the GA.

levan-m avatar Mar 20 '23 14:03 levan-m

That's right, I was using v1alpha1 but was assuming either the webhook or the migration.datadogAgents.version=v2alpha1 would take care of the migration for us or a better error then a panic.

Bummer that we're kind of blocked on this but I understand your teams priority to get 1.0 out the door. Any pointers to manually migrating to v2alpha1 or WIP code on webhook/docs? Thanks

mikesplain avatar Mar 20 '23 14:03 mikesplain

I am sorry you getting blocked and for the confusing naming in our chart!

Webhook is controlled by a different property in the Helm chart (see here), but actually enabling it is more than flipping that flag.

To unblock yourself you could take a look at this doc update PR https://github.com/DataDog/datadog-operator/pull/665, this may be helpful if you try to manually map v1alpha1 manifest to v2alpha1.

This DataDog/helm-chart#918 adds webhook configuration and documentation to the Chart. However it still has several issues and may not work out of the box.

levan-m avatar Mar 20 '23 15:03 levan-m

I'm also currently blocked by this issue. Is this still on track for a mid-April release?

OurFriendIrony avatar Apr 02 '23 11:04 OurFriendIrony

We are also blocked with this issue. Any update on the release?

pythian-chaitanya avatar Apr 18 '23 02:04 pythian-chaitanya

We released 1.0.0 about two weeks ago and recommend migrating to it to avoid this issue. Right now we don't have plans to backport the fix in 0.x.x versions.

The is the official migration guide https://docs.datadoghq.com/containers/guide/datadogoperator_migration/

Please also check out update Operator and Helm Chart readme docs https://github.com/DataDog/datadog-operator/blob/main/README.md https://github.com/DataDog/helm-charts/blob/main/charts/datadog-operator/README.md

Given this problem has been addressed since 1.0.0-rc7 and we have a migration path to 1.0.0, I consider the issue resolved.

levan-m avatar Apr 18 '23 14:04 levan-m