consul-k8s
consul-k8s copied to clipboard
Sync changes as they occur in k8s and reduce full sync default interval
In addition to registering all services periodically, now sync new services as they appear, allowing to increase the full sync interval and reduce the overall number of catalog register call to Consul servers. This makes a difference for kube clusters with hundreds of instances without delaying updates.
Updated version of #125 for investigation. This has the benefit of syncing things as they come in, but leaves some possible race conditions with service watchers and deregistrations. A full solution to the problems of (1) updating services as soon as they've changed in kubernetes and (2) not creating large amounts of raft transactions every sync period on registrations that haven't change may need to be a bit more invasive.
Adding additional details related to my last comment.
Context The original PR is trying to solve two current problems with the current catalog sync implementation:
- Update services in Consul as soon as they've changed in Kubernetes
- Reduce the cluster raft noise that happens at the
syncPeriodintervals when everything is re-registered whether it has changed or not.
By doing some of the work on (1), it's possible to reduce the syncPeriod, which mitigates but does not solve (2).
Issues with the implementation
All of these changes are made in the Syncer portion of the catalog sync process. This is the section of the code that interacts with Consul. Another piece of the code watches Kubernetes for changes, then creates Consul registrations and passes them to the Syncer.
The Syncer maintains a set of internal maps that keep track of (a) current valid Consul registrations, (b) services in Consul that should be deregistered, (c) current service watcher routines that are monitoring services in Consul, and (d) a quick reference table that allows lookup of valid services by service name rather than id. These internal maps are the source of truth for what should be registered in Consul. The biggest fundamental issue with the original PR is that it breaks this guarantee-- these internal maps are no longer the source of truth.
Additionally, it modifies (a) and (d), but does not address the other two. These would only update on the (much farther apart) syncFull run. Because service watchers aren't stopped for deregistered services, these routines could continue to run, potentially adding things to (b) that no longer exist in Consul. This could create spurious error messages in the logs when we try to deregister nonexistent services. Also, since service watchers aren't started for new services, we potentially won't remove invalid service instances from Consul in a timely manner.
More robust long-term solution A better solution is to solve issue (2) properly by only registering services that have changed. This would reduce the raft traffic significantly in all cases, but especially in large clusters. This would also allow us to sync at will because there would be no additional overhead, which makes solving (1) trivial.
This requires keeping track of the current state and comparing it to what is sent to the Syncer so that we deal in differences rather than in whole lists. Conceptually, this is simple, but there are all sorts of potential issues and edge cases that might make this complex to implement.
We are interested with this change, do you think you could rebase it to integrate it? cc @ShimmerGlass @i0rek
I'm running the catalog sync and i'm having all the perf issue mentioned by @adilyse. Tracked in https://github.com/hashicorp/consul-k8s/issues/315 and https://github.com/hashicorp/consul-k8s/issues/319
I believe that the most scalable solution to this problem is to have a main watcher that watches for all services (new service added or removed with sync annotation) that starts a set of dedicated watcher per service that take care to sync in consul the service changes (or endpoints changes in my case).
the dedicate watcher needs to operate only on endpoints that are managed by him (ownership tag) so that sync from multiple clusters doesn't not create issues removing endpoints for the same service that exists in another k8s cluster.