Best Practice to reconcile all objects at once
Type of question
Best practices
Question
What did you do?
I created a basic operator using the operator-sdk scarfolding tool.
One important usecase involves enumerating all instances of a specific CRD, and generate a single configuration file, which will be mounted into a Daemonset. Basically it's like the NGINX ingress controller, which takes all Ingress objects, and generates one big nginx.conf file.
What did you see?
After scarfolding, the entry point of the CRD processing is done in the Reconcile function of the controller.
I get an Reconcile event and the corresponding function call whenever:
- The controller is started, for all exiting objects (is this guaranteed?)
- When there is a modification to an exiting object
Because I can't generate a valid configuration without knowing every instance, the pattern of handling each object individually feels a little bit fragile and cumbersome, because I need to keep track of state that the API server knows best.
I really want avoid keeping track of the objects internally.
Best practice for processing all instances at once?
The fist thing that came to my mind was to use the Reconcile function just to trigger the processing of all instances,
which obviously starts by listing all objects using the .Client.List function.
I could use some debouncing logic to prevent listing instances over and over, specially at startup, like:
func (r *RouteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := ctrllog.FromContext(ctx)
debounce(func() { reconcileAll(context.Background(), r.Client) }, time.Second * 1) }
}
func reconcileAll(ctx context.Context, client client.Cient) {
list := &testv1.RouteList{}
err := client.List(ctx, list)
// generate routerconfig for all object ...
}
This seems a little bit flawed, because:
- Should I use the
.Listfunction of the Client (r.Client.List), or directly from the controller (r.List). - How should I handle logging? Passing the
logr.Loggerto thereconcileAllfunc seems wrong, as it is scoped to that specific reconcile. - Creating an own
context.Contextis unavoidable, since the orginal context will be completed. Is there anything to concider, e.g. for teardown?
But what I want to know is, if there is a kind of best practice how to implement reconciliation of all objects at once?
Environment
Operator type:
/language go
Kubernetes cluster type:
Currently in dev/eval state using kind.
$ operator-sdk version
operator-sdk version: "v1.23.0", commit: "1eaeb5adb56be05fe8cc6dd70517e441696846a4", kubernetes version: "1.24.2", go version: "go1.18.5", GOOS: "linux", GOARCH: "amd64"
$ go version (if language is Go)
go version go1.18.1 linux/amd64
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Thank you!
@gprossliner I'll start out with answering your questions and then go into some of my thoughts/questions regarding your use case.
The controller is started, for all exiting objects (is this guaranteed?)
This is correct. Whenever the controller is started - if there exists any CRs on the cluster that the controller has permissions to reconcile they will be reconciled.
Should I use the .List function of the Client (r.Client.List), or directly from the controller (r.List).
I would recommend using the List function from the controller (r.List()) as it uses the controller's cache rather than hitting the API server every time like it would if you used r.Client.List().
How should I handle logging? Passing the logr.Logger to the reconcileAll func seems wrong, as it is scoped to that specific reconcile.
I would probably recommend creating a logger for the context that is passed to the reconcileAll function. So inside the reconcileAll function essentially do:
log := ctrllog.FromContext(ctx)
I think this would set up a logger properly for that function using the context.
Creating an own context.Context is unavoidable, since the orginal context will be completed. Is there anything to concider, e.g. for teardown?
I would recommend taking a look at creating a new context.Context along with a cancel function that can be used to close the context at the end of the reconcileAll function. Here is some documentation on what I am referring to: https://pkg.go.dev/context#example-WithCancel
Regarding your use case, I have a couple questions:
- If a new instance of your
RouteListCRD gets created after you have already created the singleRouterConfigresource, will you need to delete theRouterConfigand create a new one?- As a follow up to this question, could you instead verify the existence of the
RouterConfigresource and just update it every time aRouteListCR is created and reconciled? - If this is the case, and the scenario in my follow up question can't be done then I think your current approach is reasonable. I'm not to familiar with this use case so I can't say whether or not it is a best practice approach.
- As a follow up to this question, could you instead verify the existence of the
I hope this information helps!
Thank you @everettraven for your response, and for clarification of my first questions.
- If a new instance of your RouteList CRD gets created after you have already created the single RouterConfig resource, will you need to delete the RouterConfig and create a new one?
Actually, I have two CRD: RouterConfig, where in most cases, there will only be one per cluster, and Route, which are associated by sharing a key in spec. RouteList is not an own CRD, but the list of all Routeinstances.
As a follow up to this question, could you instead verify the existence of the RouterConfig resource and just update it every time a RouteList CR is created and reconciled?
This was one of my experiments. I added a label in the Reconcile of the RouteController, which triggers a Reconcile for the RouterConfig object. In principle it worked, but I got the infamous error when two different controllers update the same resource, RouterConfig in my instance, specially at operator startup, because the two different controllers reconcile existing instances in parallel. So I tried to only modify any object in its own controller, not in a different one. Would you recommend this convention, if possible?
If I could disable running controllers in parallel, this could resolve this issue.
@gprossliner Okay, I think I have a little more context now. I think you may be able to do something similar to what you described but without triggering the Reconcile for the RouterConfig from the RouteController.
What I was wondering in my follow-up question was if you would be able to have a user create a RouterConfig CR that they reference in the creation of a Route CR? This way when you reconcile a Route CR you can get the corresponding RouterConfig and update the necessary values. Upon updating the necessary fields in the RouterConfig CR it should trigger the Reconcile function for the RouterConfig CR.
For more detail this is kind of the design I was thinking:
graph TD
A[Create RouterConfig] --> B{{RouterConfig Foo}}
C[Create Route] --> D{{Route Bar}}
D -- References --> B
E(RouteController) -- Reconciles --> D
E -- Updates --> B
F(RouteConfigController) -- Reconciles --> B
For an example of how a user would step through this:
- Create a
RouterConfigCR:
apiVersion: routers/v1
kind: RouterConfig
metadata:
name: foo
spec:
...
- Create a
RouteCR that should be used inRouterConfignamedfoo:
apiVersion: routers/v1
kind: Route
metadata:
name: bar
spec:
routerConfigName: foo
...
- Once step 2 is done, the
RouteControllercan update theRouterConfigthat was created in step 1 as necessary - Once the
RouterConfigis updated in step 3 theRouterConfigControllerwill reconcile the updatedRouterConfigCR and do what it needs to do.
Do you think that this kind of flow would work for your particular use case?
Thank you very much for your detailed answer. This is exactly my usecase. I will evaluate your suggestions and tell you about the outcommings!
I've done some testing for your purposed solution. The critical path is when the RouteController updates the RouteConfig object, because it updates an object "owned" by a different controller. This is where I had problems in the past.
But I tried again, and added some logging to check what is about to happen. I even added a mutex to ensure only one controller runs Reconcile at any given time, but it turns out, that this is not the problem.
The problem is that I get an outdated version from cache. Here is some logs:
# reconcile is triggered on a route
1.664357609617001e+09 LEVEL(-2) Reconcile Route {"controller": "route", ...
# the controller looks up for it's RouterConfig (by .Client.List), finds a match resource-version=2351466
1.6643576096171207e+09 LEVEL(-2) RouterConfig loaded 2351466 {"controller": "route", ...
# the controller updates the RouterConfig (just an Annotation for this test), executes r.Update(ctx, routerConfig), resulting in ResourceVersion 2351524
1.664357609624411e+09 LEVEL(-2) RouterConfig updated 2351524 {"controller": "route", ...
# this update triggers a Reconcile of RouterConfig, as expected
1.6643576096283698e+09 LEVEL(-2) Reconcile RouterConfig {"controller": "routerconfig", ...
# >>> the RouterConfig controller loads the Object (by r.Get) from the ctrl.Request, but it gets Version 2351466, NOT 235524!!!
1.664357609628595e+09 LEVEL(-2) Loaded RouterConfig ResourceVersion 2351466 {"controller": "routerconfig", ...
# which causes the following update to fail (as expected, because the current version is 2351524, and we got 235466
1.6643576096483934e+09 ERROR Unable to update the status {"controller": "routerconfig", .. the object has been modified; please apply your changes to the latest version and try again
Can you explain this behavior?
I created a repo, but it doesn't show the described behavior... I keep you updated
https://github.com/gprossliner/operator-demo
I've done some testing for your purposed solution. The critical path is when the RouteController updates the RouteConfig object, because it updates an object "owned" by a different controller. This is where I had problems in the past.
But I tried again, and added some logging to check what is about to happen. I even added a mutex to ensure only one controller runs
Reconcileat any given time, but it turns out, that this is not the problem.The problem is that I get an outdated version from cache. Here is some logs:
# reconcile is triggered on a route 1.664357609617001e+09 LEVEL(-2) Reconcile Route {"controller": "route", ... # the controller looks up for it's RouterConfig (by .Client.List), finds a match resource-version=2351466 1.6643576096171207e+09 LEVEL(-2) RouterConfig loaded 2351466 {"controller": "route", ... # the controller updates the RouterConfig (just an Annotation for this test), executes r.Update(ctx, routerConfig), resulting in ResourceVersion 2351524 1.664357609624411e+09 LEVEL(-2) RouterConfig updated 2351524 {"controller": "route", ... # this update triggers a Reconcile of RouterConfig, as expected 1.6643576096283698e+09 LEVEL(-2) Reconcile RouterConfig {"controller": "routerconfig", ... # >>> the RouterConfig controller loads the Object (by r.Get) from the ctrl.Request, but it gets Version 2351466, NOT 235524!!! 1.664357609628595e+09 LEVEL(-2) Loaded RouterConfig ResourceVersion 2351466 {"controller": "routerconfig", ... # which causes the following update to fail (as expected, because the current version is 2351524, and we got 235466 1.6643576096483934e+09 ERROR Unable to update the status {"controller": "routerconfig", .. the object has been modified; please apply your changes to the latest version and try againCan you explain this behavior?
@gprossliner When you attempt to get the RouteConfig object in the RouteController are you using the r.List() or r.Client.List() function? My understanding of this particular scenario would be that you would want to use the r.Client.List() function for fetching any RouteConfig objects from the cluster to make modifications to and when listing Route objects you would want to use r.List(). Hopefully this helps!
I created a repo, but it doesn't show the described behavior... I keep you updated
https://github.com/gprossliner/operator-demo
I will plan to take a look at this, thanks for sharing!
/assign
@gprossliner I took a look at the RouteController from the repo you shared and I think my previous comment regarding which List function to use should resolve your issue. This list call:
https://github.com/gprossliner/operator-demo/blob/2825b14e169e43b7da63e51a469feffd68eba6f6/controllers/route_controller.go#L74-L77
should be:
// find the corresponding RouterConfig
routerConfigs := &groupv1.RouterConfigList{}
log.Info(fmt.Sprintf("<- LIST RouterConfig"))
err = r.Client.List(ctx, routerConfigs)
Thank you, I'm out of office until tomorrow, but I will check as soon as I will return.
Hi! Now I use the .Client methods everywhere. Basically to wrap the API calls in custom functions to perform central logging.
In my real project, the problem is still there, it happens about 1 of 3 times. In the test repo that I've shared, I am still not able to recreate the issue. I will now start to tear down the logic, and compare what is different in both repos line-by-line - again.
Regarding r.List vs r.Client.List:
I'm not really a GO expert, but are those calls really different?
-
If I use "Go to definition" in vscode on
r.List, I get to the "Reader" interface of controller-runtime (https://github.com/kubernetes-sigs/controller-runtime/blob/7399a3a595bf254add9d0c96c49af462e1aac193/pkg/client/interfaces.go#L58) -
If I check where
r.Clientcomes from, it is the struct member created by the operator-sdk, like shown in my repo:
type RouteReconciler struct {
client.Client // client here
Scheme *runtime.Scheme
}
- When I use the vs code debugger extension (ID golang.go) to Step Into
r.Listorr.Client.List, I land on the same func (https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/split.go#L134)
So am I calling the same func anyway, regardless if I use r.List or r.Client.List, like tested here: https://go.dev/play/p/XMILc-FQbUC?v=gotip.go
I was finally able reproduce the issue in the demo repo. It seems that because once all Route objects have been reconciled, that never was really an update to RouterConfig, because nothing has changed. Now I include a Token field in the RouterConfig.Status struct, which is updated every time a Route object is reconciled (https://github.com/gprossliner/operator-demo/commit/db9e0a7320444ad8f9293d5a4d6a4a9ff668aae9 and https://github.com/gprossliner/operator-demo/commit/f22da1b26977ca71a8e17cbb0b910e57705c0857).
This are the logs:
1.6649591828762345e+09 INFO Reconcile Route {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
1.6649591828762524e+09 INFO <- GET Route {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
1.6649591828762975e+09 INFO -> GET Route ResourceVersion=2372295 {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
1.664959182876307e+09 INFO <- LIST RouterConfig {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
1.6649591828763332e+09 INFO -> LIST ITEM RouterConfig ResourceVersion=3447815 {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
1.664959182876349e+09 INFO <- POST STATUS RouterConfig ResourceVersion=3447815 {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
1.6649591828767962e+09 INFO Starting workers {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "worker count": 1}
1.664959182876901e+09 INFO Reconcile RouteConfig {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "41adbf8a-fea7-45b3-b6a4-d28e1349fc2a"}
1.6649591828769143e+09 INFO <- GET RouteConfig {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "41adbf8a-fea7-45b3-b6a4-d28e1349fc2a"}
1.6649591828769646e+09 INFO -> GET RouteConfig ResourceVersion=3447815 {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "41adbf8a-fea7-45b3-b6a4-d28e1349fc2a"}
1.6649591828769715e+09 INFO <- POST STATUS RouteConfig ResourceVersion=3447815 {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "41adbf8a-fea7-45b3-b6a4-d28e1349fc2a"}
1.6649591828800645e+09 INFO -> POST STATUS RouteConfig ResourceVersion=3447829 {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "41adbf8a-fea7-45b3-b6a4-d28e1349fc2a"}
1.6649591828802722e+09 INFO Reconcile RouteConfig {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "f8426fca-6def-4b45-ab0b-64892083c870"}
1.664959182880289e+09 INFO <- GET RouteConfig {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "f8426fca-6def-4b45-ab0b-64892083c870"}
1.6649591828803065e+09 INFO -> GET RouteConfig ResourceVersion=3447829 {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "f8426fca-6def-4b45-ab0b-64892083c870"}
1.6649591828803165e+09 INFO <- POST STATUS RouteConfig ResourceVersion=3447829 {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "f8426fca-6def-4b45-ab0b-64892083c870"}
1.664959182909424e+09 INFO -> POST STATUS RouteConfig ResourceVersion=3447829 {"controller": "routerconfig", "controllerGroup": "group.test.org", "controllerKind": "RouterConfig", "routerConfig": {"name":"routerconfig-sample","namespace":"default"}, "namespace": "default", "name": "routerconfig-sample", "reconcileID": "f8426fca-6def-4b45-ab0b-64892083c870"}
1.664959182909447e+09 ERROR -> POST RouteConfig {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37", "error": "Operation cannot be fulfilled on routerconfigs.group.test.org \"routerconfig-sample\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/home/[email protected]/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/home/[email protected]/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/[email protected]/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/home/[email protected]/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
1.6649591829095078e+09 INFO <- POST STATUS Route ResourceVersion=2372295 {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
1.6649591829137254e+09 INFO -> POST STATUS Route ResourceVersion=2372295 {"controller": "route", "controllerGroup": "group.test.org", "controllerKind": "Route", "route": {"name":"route-sample1","namespace":"default"}, "namespace": "default", "name": "route-sample1", "reconcileID": "6ea40183-a0fb-4a14-bc23-65825cc79d37"}
Error happens 1 of 3 times I tested.
Hi @gprossliner,
Regarding r.List vs r.Client.List:
I'm not really a GO expert, but are those calls really different?
You definitely bring up some good points, and it may be the case that I misunderstood and that they simply are the same. My understanding was that r.List would check the cache first while r.Client.List would always hit the kube api server. After looking at your analysis I believe that they are simply the same and I just had a misunderstanding (I never really looked in depth at the difference so thanks for bringing it up - I learned something new today :smile: ).
Error happens 1 of 3 times I tested.
So I did some further looking at your sample operator code, and with this use case it may just be that because there are 2 controllers making modifications to a CR you can't perform a regular Update without at some point running into this issue. The way around this is to instead use a Patch and use server-side apply.
Here is a Kubernetes doc on server-side apply: https://kubernetes.io/docs/reference/using-api/server-side-apply/
One thing to note is that it is best practice to not use concrete api objects for server-side apply patches as the nil values of certain fields would be applied. In this case you would want to use something like an unstructured.Unstructured and include only the fields you need and then use that unstructured.Unstructured object when applying your patch.
I actually recently had to do this with a controller I was working on because it was updating resources that were also reconciled by another controller and would occasionally run into the same problem you are having. Here are some links to the code that I wrote for that particular controller:
- actually doing the patch: https://github.com/operator-framework/oria-operator/blob/a77f66f5adb55601f90b7bc6bdbf28bfb911f4a0/controllers/scopeinstance_controller.go#L199-L205
- function for creating the
unstructured.Unstructuredobject: https://github.com/operator-framework/oria-operator/blob/a77f66f5adb55601f90b7bc6bdbf28bfb911f4a0/controllers/scopeinstance_controller.go#L209-L222 - function for doing a patch with server-side apply: https://github.com/operator-framework/oria-operator/blob/a77f66f5adb55601f90b7bc6bdbf28bfb911f4a0/controllers/scopeinstance_controller.go#L282-L288
To me the rest of your controller logic looks good, and to resolve this error I would recommend doing the server-side apply patching instead of a traditional update. I hope this information helps!
@gprossliner I just wanted to follow up. Did you have a chance to try out using server-side apply as mentioned in my previous comment?
Hi there! Not yet, I'm sorry, but I will keep you informed.
Is there any way to check if a reconciliation is "initial", that means on startup for all resources, or from a change of a resource? Or to disable the initial reconciliation of a specific kind?
This would also help me a lot, since there is no point in reconciling all Route objects one at a time at startup, as I never need to act on individual routes, but only ever on all routes for a single RouterConfig object.
The logic would be:
- On all reconciliations of the
RouterConfig- initial or change - I would list allRouteobjects, and process them basically be generating a configuration for an external system - On initial reconciliations of the
Routeobjects do nothing. I won't miss any changes, because they are processed anyway when theRouterConfigis initially reconciled (1). - On subsequent reconciliations of a
Routeobject, because of a new or edited route, I update (or patch) the correspondingRouterConfig, which triggers (2).
Is there any way to check if a reconciliation is "initial", that means on startup for all resources, or from a change of a resource? Or to disable the initial reconciliation of a specific kind?
This would also help me a lot, since there is no point in reconciling all Route objects one at a time at startup, as I never need to act on individual routes, but only ever on all routes for a single RouterConfig object.
I think it is possible to perform that kind of logic but my understanding is that something like that would go against the best practice of keeping your reconciliation loop idempotent. The goal with keeping it idempotent is that regardless of whether or not your CR is being created or updated the same logic is run to ensure the desired state is reached. The downside to not keeping the reconciliation idempotent is that it is possible for the operator to get stuck into a bad state that would require manual intervention.
With that being said, if you truly needed to do something unique on creation vs update I think using an annotation to determine if the CR is being created or updated would allow for this. In your reconciliation loop you could check for an annotation along the lines of "hasBeenReconciledBefore". If the CR has just been created it is likely it won't have that annotation and you can add the annotation to the CR and then return from the reconciliation loop. This would make it so when the CR is updated the annotation will likely exist and you will know that it is being updated instead of created.
The logic would be:
- On all reconciliations of the
RouterConfig- initial or change - I would list allRouteobjects, and process them basically be generating a configuration for an external system- On initial reconciliations of the
Routeobjects do nothing. I won't miss any changes, because they are processed anyway when theRouterConfigis initially reconciled (1).- On subsequent reconciliations of a
Routeobject, because of a new or edited route, I update (or patch) the correspondingRouterConfig, which triggers (2).
With this logic, what would happen if a new Route CR is created after the creation of the RouterConfig CR with no changes being made to the RouterConfig CR?
Thinking through this logic my conclusion is the new Route CR would essentially be ignored until it is updated OR the RouterConfig or any other Route CR is updated.
I hope this helps @gprossliner !
Thank you again for your patience and answer. Based on your comment, I was able to solve the problems with the optimistic concurrency error: https://github.com/gprossliner/operator-demo/commit/d4ee18492a6005b4d91bbfecb14a4f216e1ad348
I don't introduce a new field like the hasBeenReconciledBefore you mentioned, but I use the observedGeneration field of my status condition to check if something has changed, that forces me to update the corresponding RouterConfig object within the RouteController. I also added a finalizer so that I am able to remove the Route from the RouterConfig if it has been deleted.
With this logic, what would happen if a new Route CR is created after the creation of the RouterConfig CR with no changes being made to the RouterConfig CR?
Thinking through this logic my conclusion is the new Route CR would essentially be ignored until it is updated OR the RouterConfig or any other Route CR is updated.
I think you didn't get me right on this. Added objects need (and should) be reconciled. What I meant by "initial" is that all existing objects are reconciled on startup, even if nothing has changed, not newly created instances.
A little OT from now, but I want to share with you:
It still feels a little bit complex, because in a client-go based implementation I am in total control, and don't need to evaluate all context on each stateless Reconcile call. But I'm aware that this is not really related to operator-sdk, not even kube-builder, but the controller-runtime package.
In my real project, I will still need client-go, because not all features are available for ctrl.Client, like subresources to exec into a container.
@gprossliner I'm glad that you were able to resolve the issue! Since you were able to resolve this issue, I'm going to close the issue for now. If you have any more questions related to this feel free to reopen.
If you have any other questions feel free to open a new issue or join the operator-sdk community slack channel and ask there.