[spike] Investigate DNSrecond events
DNS Records status is updated frequently due to the "queuedAt" field, let's investigate if there is a way to remove this field without risking excessive communication with the DNS Provider API.
This was discovered during the investigation into https://github.com/Kuadrant/kuadrant-operator/issues/1085 it was notice after the load test had completed there was a number of events triggered by a DNSrecord. There was nothing in the logs to suggest what actions the operator was preforming.
Attached is the pod logs from two different load test runs, the pods were restarted with new resource limits between runs.
kuadrant-operator-controller-manager-8464cd4785-82lfs-manager.log kuadrant-operator-controller-manager-db6784dfc-ghv9d-manager.log
With the current implementation of the DNSRecord status update I'd say this is expected.
In order to facilitate multiple records and therefore multiple clusters sharing a hostname, a mechanism by which the consistency of the records in the provider was enforced was added to the DNS record reconcile. This in short comes down to periodically reconciling the DNSRecord every x amount of time (15 minutes i believe), and during the initial creation doing this more frequently, and slowly backing off until we reach the max polling interval (15 mins). It would be expected during this time, especially if multiple records are adding values to the same record set, that multiple writes are happening to the DNSRecord status.
Following is an example of the diff of the same record that has been created for a few hours:
31c31
< "resourceVersion": "90148",
---
> "resourceVersion": "121484",
219c219
< "queuedAt": "2025-01-13T22:34:58Z",
---
> "queuedAt": "2025-01-14T08:41:03Z",
Diff of a record that has only been created for a shorter while (Notice the validFor is changing):
31c31
< "resourceVersion": "35504",
---
> "resourceVersion": "39965",
219c219
< "queuedAt": "2025-01-13T17:12:40Z",
---
> "queuedAt": "2025-01-13T17:23:20Z",
338c338
< "validFor": "10m40s",
---
> "validFor": "15m0s",
There is an issue with the endpoint list not being ordered and can be updated sometimes unnecessarily, but because we are updating these other values in the status on every reconcile, it's going to cause an update regardless.
Might be possible to patch those fields instead of update, not totally sure if that makes any difference, i would have thought it still needed to update the resourceVersion field in order to avoid conflicts.
The resourceVersion field forever increasing in value is a bit concerning.
You might find more info on it in here , although its probably way of date now tbh.
@Boomatang is this ticket still relevant?