k8s to consul sync on changes and not via fullSync
Question
I am using the consul-k8s to sync services from k8s to consul. From the docs it is not clear when and how this sync is happening. After carefully reading the code, the comments say that changes should happen when changes are detected via the k8s controller but the code itself syncs services to consul only on fullSync. I would like to make this clear, is this a bug or the desired design?
CLI Commands (consul-k8s, consul-k8s-control-plane, helm)
/bin/sh -ec consul-k8s-control-plane sync-catalog \ -log-level=info \ -log-json=false \ -k8s-default-sync=false \ -to-k8s=false \ -consul-domain=consul \ -allow-k8s-namespace="*" \ -deny-k8s-namespace="kube-system" \ -deny-k8s-namespace="kube-public" \ -k8s-write-namespace=${NAMESPACE} \ -node-port-sync-type=ExternalFirst \ -consul-k8s-tag=runtime-k8s \ -consul-node-name=k8s-sync-eks-prd-euw1-dar-general \ -add-k8s-namespace-suffix \
Helm Configuration
# https://github.com/hashicorp/consul-k8s/blob/c79730ba4f0e25b8330950d7ab575e4e3cfc1ef0/hack/helm-reference-gen/fixtures/full-values.yaml
global:
enabled: true
logJOSN: true
domain: consul
image: "consul:${consul_version}"
datacenter: ${consul_datacenter}
imageK8S: hashicorp/consul-k8s-control-plane:${helm_version}
enablePodSecurityPolicies: false
# TODO: enable tls later on
tls:
enabled: false
server:
enabled: false # Consul servers are running outside K8S
externalServers:
enabled: true
hosts: [${consul_servers_address}]
client:
enabled: true
join: [${consul_servers_address}]
# TODO: enable dataDirectoryHostPath
grpc: true
nodeMeta:
pod-name: $${HOSTNAME}
host-ip: $${HOST_IP}
eks-cluster: ${eks_cluster}
resources:
requests:
memory: '8Gi'
cpu: '2'
limits:
memory: '8Gi'
cpu: '2'
updateStrategy: |
rollingUpdate:
maxUnavailable: 5
type: RollingUpdate
# https://learn.hashicorp.com/tutorials/consul/kubernetes-deployment-guide#consul-clients
extraConfig: |
{"advertise_reconnect_timeout": "15m"}
syncCatalog:
enabled: true
default: false # Need to add annotation the service: 'consul.hashicorp.com/service-sync': 'true'
toConsul: true
toK8S: false
k8sAllowNamespaces: ["*"]
k8sDenyNamespaces: ["kube-system", "kube-public"]
k8sTag: "runtime-k8s"
consulNodeName: "k8s-sync-${eks_cluster}"
priorityClassName: "system-node-critical"
connectInject:
enabled: false
controller:
enabled: false
meshGateway:
enabled: false
ingressGateways:
enabled: false
terminatingGateways:
enabled: false
prometheus:
enabled: false
tests:
enabled: false
Current understanding and Expected behavior
Sounds like the sync should happen on every change the controller detects from k8s. In fact, changes are synced to consul only on fullSync.
Comments from the code:
// ConsulSyncPeriod is how often the syncer will attempt to
// reconcile the expected service states with the remote Consul server.
// Syncer is responsible for syncing a set of Consul catalog registrations.
// An external system manages the set of registrations and periodically
// updates the Syncer. The Syncer should keep the remote system in sync with
// the given set of registrations.
// SyncPeriod is the interval between full catalog syncs. These will
// re-register all services to prevent overwrites of data. This should
// happen relatively infrequently and default to 30 seconds.
//
// ServicePollPeriod is the interval to look for invalid services to
// deregister. One request will be made for each synced service in
// Kubernetes.
//
// For both syncs, smaller more frequent and focused syncs may be
// triggered by known drift or changes.
Environment details
consul-k8s version: 0.40.0
Kubernetes version: v1.21
Cloud Provider: AWS
Hi @ezraroi, thank you for this great question. It had me looking back through some of the original design documents for this project. You are right that the catalog sync runs on fullSync, periodically at 30 second intervals by default. This is by design to avoid overwhelming the cluster with requests.
I'm not sure from the comments you provided where it is suggested otherwise. Please let me know any references which were confusing so that I can clarify them.
@t-eckert you can see from here:
// For both syncs, smaller more frequent and focused syncs may be
// triggered by known drift or changes.
and
// SyncPeriod is the interval between full catalog syncs. These will
// re-register all services to prevent overwrites of data. This should
// happen relatively infrequently and default to 30 seconds.
Also if we do hook and listen for changes in k8s i think we should sync them to consul. 30 seconds interval is very slow in real use cases. I think there 2 different types of sync:
- reconcile - Full sync on low interval
- live updates - As they happen in k8s
Hey @ezraroi
That is true and is something has been brought up before. There is a draft PR #228 and issues #315 and #319 that I believe are caused by this behavior/current implementation. Unfortunately, to make this work properly, it will likely require a rewrite of the sync catalog controller.
I'll keep this issue as an enhancement request.
I really hope this problem can be solved. Thank you!