sentry-operator
sentry-operator copied to clipboard
Teams and Projects Mismatched, "Requested resource does not exist" error in logs
Describe the bug
I set up a fairly simple configuration of teams, projects, and projectkeys to test out the operator. Given a list of different project names, I have created teams containing a single project of the same name, along with a projectkey:
apiVersion: sentry.kubernetes.jaceys.me/v1alpha1
kind: Team
metadata:
name: team-aaaa
namespace: sentry-operator-system
spec:
name: aaaa
slug: aaaa
---
apiVersion: sentry.kubernetes.jaceys.me/v1alpha1
kind: Project
metadata:
name: project-aaaa
namespace: sentry-operator-system
spec:
name: aaaa
slug: aaaa
team: aaaa
---
apiVersion: sentry.kubernetes.jaceys.me/v1alpha1
kind: ProjectKey
metadata:
name: production-aaaa
namespace: sentry-operator-system
spec:
name: production
project: aaaa
This configuration is repeated with bbbb, cccc, etc.
After some amount of time, in Sentry these projects started being affiliated with the wrong team. There were no manual changes to the configuration of projects or teams in Sentry, so I presume the operator did this accidentally on its own. The only other actions that occurred were occasional redeploys of the operator along with my custom Team, Project, and ProjectKey definitions to Kubernetes.
Steps to reproduce
- Create Team, Project, and ProjectKey's affiliated with each other, all with the same name (see example above). For example, the "aaaa" team has a project named "aaaa".
- After some amount of time, projects and teams get out of alignment. E.g., team "cccc" now has no projects, and instead team "bbbb" contains projects "bbbb" and "cccc". Since I have kept a one-to-one mapping of projects to teams in my setup, this is easy to observe.
- Either due to or because of this, many reconciler errors are shown in the logs. (see below)
My completely uneducated guess is that there is maybe some race condition in either the checking of if project-team pairs are valid or the updating of them, which causes the wrong team to be matched with the wrong project.
Expected behavior
- Projects should stay as part of the correct team and not move unexpectedly
Additional context
I see the following errors scrolling in the log of the sentry-operator-controller-manager:
2021-02-02T20:36:04.820Z ERROR controllers.ProjectKey failed to recreate ProjectKey {"projectkey": "sentry-operator-system/production-aaaa", "error": "sentry: The requested resource does not exist"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/jace-ys/sentry-operator/controllers.(*ProjectKeyReconciler).Reconcile
/workspace/controllers/projectkey_controller.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2021-02-02T20:36:04.826Z ERROR controller-runtime.controller Reconciler error {"controller": "projectkey", "request": "sentry-operator-system/production-aaaa", "error": "sentry: The requested resource does not exist"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2021-02-02T20:36:07.316Z ERROR controllers.ProjectKey failed to recreate ProjectKey {"projectkey": "sentry-operator-system/production-bbbb", "error": "sentry: The requested resource does not exist"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/jace-ys/sentry-operator/controllers.(*ProjectKeyReconciler).Reconcile
/workspace/controllers/projectkey_controller.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2021-02-02T20:36:07.322Z ERROR controller-runtime.controller Reconciler error {"controller": "projectkey", "request": "sentry-operator-system/production-bbbb", "error": "sentry: The requested resource does not exist"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2021-02-02T20:36:11.010Z ERROR controllers.ProjectKey failed to recreate ProjectKey {"projectkey": "sentry-operator-system/production-cccc", "error": "sentry: The requested resource does not exist"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/jace-ys/sentry-operator/controllers.(*ProjectKeyReconciler).Reconcile
/workspace/controllers/projectkey_controller.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
2021-02-02T20:36:11.016Z ERROR controller-runtime.controller Reconciler error {"controller": "projectkey", "request": "sentry-operator-system/production-cccc", "error": "sentry: The requested resource does not exist"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
This project seems super promising, thank you for creating it! I just hope this bug can get worked out, as it's stopping me from using it with a self-hosted instance of Sentry.
Hi @jwoglom!
Thanks for raising this issue. This doesn't sound desirable indeed, sorry if it caused you any inconvenience! I will have a look later today and try to resolve it ASAP 🙂
When you say "some amount of time", what's the rough scale of time here? Minutes/hours/days etc.? And have you seen it happen right after redeploying the operator?
Hi -- the approximate period of time this was up for was about 3-4 days. I deployed and set up sentry-operator on Thursday/Friday, and saw the initial projects + teams get created (scrolling through the Sentry UI, everything appeared to be in order). Then today (on Tuesday) I noticed the mismatches, and that the log scrolling with the messages I posted above had been going on for at least 6 hours or so (could only go so far back with kubectl logs).
I haven't seen it happen immediately after redeploying the operator but I will look further into this tomorrow.
Great thanks for the info! Will let you know what I find, cheers!
Hey @jwoglom!
I just spent some time looking into this, but I can't seem to replicate the issue you're seeing yet (maybe I'm not testing it for long enough). Re-deploying the operator doesn't seem to cause any issues for me. But just some observations / questions I have in mind:
- I noted that you're using a self-hosted version of Sentry. What version do you happening to be running? I have only been testing against the cloud version of Sentry, so it's possible that the API / version used for self-hosted Sentry might be different, causing the API calls to do some unexpected things (doesn't seem likely though).
- The API (for Sentry cloud) doesn't actually allow you to change the team of a project through the API (I have noted this earlier in this comment and had to document an explicit way of working around this if you wanted to change a project's team. So it's quite strange that the bug you're seeing is the opposite of what the API allows. In this case, what might be happening is that the projects are getting deleted and re-created entirely under a different team rather than moved 🤔 Did you notice this? (ie. Sentry error history/data being lost or similar?)
- When you redeployed the Team, Project, and ProjectKey CRDs, were they modified in any way? The logs you shared seem to suggest that the Sentry API client is not able to find the project in the organization when recreating the key, even though you see it exist (projects "aaaa", "bbbb", "cccc" still exist but just under different teams).
I will try leaving the operator running in my cluster overnight and see if this issue pops up tomorrow. If I get the time this weekend, I will try spinning up an instance of self-hosted Sentry in my cluster to test against. Are you deploying Sentry via a Helm chart?
Hi @jace-ys, thanks so much for looking into this. I am deploying a modified version of the Sentry Helm chart to a Kubernetes cluster -- long story short I exported the Sentry chart to YAMLs, disabled the built-in clickhouse cluster since it was causing some problems in my environment, and made some other minor configuration tweaks.
Some things I did do, if it helps in testing, is delete the deployment (e.g., kubectl delete'd all of the resources) and then re-apply. I also triggered a rollout restart of the sentry-operator-controller-manager deployment after each apply. Maybe some combination of these inadvertently got everything in a bad state.
When I have time I'll try and see if I can reproduce this again and log what I did.
Yeah it's possible that some series of changes led to corrupt state / data races somewhere.. I'll have another dig into the code to see if I can spot anything obvious.
I'll also try to spin up a Sentry instance in my cluster and test it against that as well. Thanks for your time raising this issue and help investigating! Hopefully we can resolve this soon 🙌