Reconcile should perform a diff with Azure
At the moment we are doing a unilateral PUT of each resource when we reconcile; this works but has some drawbacks.
- Users with many resources have run up against throttling issues (e.g. #2591, #2643)
- Users have raised bugs because they thought the operator should only PUT when there's a change required (e.g. #2590)
We should diff the current state of the resource with the spec and only do a PUT where required (see #2600 for potential design).
Follow-up to #1491 as much of that has been implemented already.
Another thing I recently realized: Diffing will cause issues with Azure Policy, see: https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/3009#issuecomment-1419674132
Another idea that just occurred to me, documenting it here in case it's actually a good one (no guarantees): We could have a heuristic that reduced PUT load while at the same time not requiring the full investment and problems that come with client-side diffing.
A good example of such a heuristic would be:
- After the PUT, take the shape of the returned Azure resource (the Status), hash it and store it in an annotation
- Before any subsequent PUT, issue a GET and compare the hash vs the hash we have.
We could have manually crafted customizations per resource, or possibly automatically generate customizations from the Swagger to ignore fields like readonly times that are constantly changing and would always cause the payloads to mismatch. Although I actually don't think there are that many constantly changing fields in the APIs I have looked at.
This approach has a lot of advantages:
- It works with Azure policy.
- It's (relatively) easy to implement.
- It doesn't require perfect upstream Swagger specifications to correctly notate every readOnly field
If we like this general idea but don't love the hash (hard to show exactly what diffed w/ a hash), we could store a compressed or uncompressed raw annotation of the status JSON instead. (I say compressed because sticking the entire status into an annotation without shrinking it somehow might cause problems). I do think a hash would work though as we could log the input value in the logs so we don't necessarily need a diff constructible from the annotation as users (or us) can get it manually from the logs.
I believe that the above is actually what Terraform does - from what I can tell they issue a GET after a PUT and then use the result of that GET as the statefile.
They only (manually) extract certain fields from the GET response: presumably fields that they know are writable, thus ignoring all of the readonly fields.
We're still keen on doing this.
Is there any estimate when ASO will stop using PUT for reconciliation and will use the DIFF described above instead? Is there some guarantee it will get to 2.4.0? Any rough time schedule?
We're still keen on doing this.
Any specific plans when we could expect this feature?
We're still keen on doing this.
Any specific plans when we could expect this feature?
It's not currently that high on the list, mostly because it's quite difficult to implement generically. Can you expand on why you want/need it? AFAIK there are two key reasons it's interesting over just doing PUT:
- To reduce PUT load on the subscription and avoid throttling. We've made improvements to how often we PUT already though, and since then I haven't heard too many users run into throttling concerns.
- To avoid messing with resources whose PUT isn't idempotent, unless there is a change to be made. AFAIK very few resources fall into this category.
Does your desire fall into one of the above? Or is there another category of reason we're missing?
We are working on a solution which needs to be scalable. As soon as users/tenants create new services, we need to create new resources. The number of expected resources could be higher than the current ASO throughput. Having a limit of 300/1200 resources per subscription may not be sufficient. So there's neither an estimate nor a guarantee it will be part of 2.4.0, do I understand it right?
Ah, yes our FAQ is actually slightly out of date here: It says,
Azure subscriptions have a default limit of ~1200 PUTs
This is not actually strictly true. The limit is 1200 PUTs/hr per HTTP connection (well per frontend instance but HTTP connections are pinned to an instance so it basically boils down to that). We now have an HTTP client configured that uses multiple connections (see #2685), so the limit is higher than it used to be.
After we made the above changes, we haven't seen users actually hitting throttling in practice. That doesn't mean there isn't a limit (there is), but it's higher than 1200/hr by probably something like an order of magnitude. There are also limits on GETs so there's always going to be a maximum number of resources you can manage in a single subscription.
Don't necessarily let that FAQ entry scare you away - I've made a note to update it, but in reality between the improvements we've done and the ability to tune the azureSyncInterval you should have all of the tools you need to scale to a large number of resources today, even without this diffing logic.
That's not to say we aren't going to do this diffing stuff, but the approach we have now is actually pretty good from a "number of resources" perspective and so we're waiting for more signal to bump the importance of this up.
This is still something we're interested in, although given our throttling changes it doesn't seem as critical.
Still interested in this, it would still solve some theoretical problems, but with Azure's updated throttling that's being rolled out (or maybe is already out?) and the improvements we made last year and the year before it doesn't seem as urgent as it once did.
Following up on this:
The limit is 1200 PUTs/hr per HTTP connection
ARM throughput limits have been increased significantly in 2024:
Throttling limits have increased by roughly 30 times for writes, 2.4 times for deletes, and 7.5 times for reads.
The updated limits are higher in general, but the way throttling is being applied has also changed. For writes it is 200 with 10 second refill rate for a subscription and the same rate per tenant. Refill happens every second. There is an example which suggests the refill rate is 10.
For example, with a write bucket size of 200 tokens and a refill rate of 10 tokens per second, even after depleting the bucket with 200 API requests in a second, you can continue making 10 API requests per second.
The way I read it then, having multiple HTTP connection does not help and possibly makes things worse. A burst of requests would mean the initial pool is exhausted quicker.
Hello folks, I've just discovered an occurrence of this issue that is effectively performing useless calls to ARM to update remote resource for no reason.
Here how replicate the issue: 1 - Have a resource group with a given set of TAGs 2 - Have an Azure policy that cascades down RG tags to any resources being created under the RG 3 - We create any ASO CR with or without tags
When K8S applies the resource, resulting manifest holds in its .status.tags the list of all the tags that were cascaded by the Azure policy.
Every hour (defaul reconciliation interval) ASO reconciler performs again a PUT to ARM.
Such behavior is clearly visible in the activity log of the controlled resource.
For instance, for a storageaccount, I observe many actions Update Storage Account Create, on which we keep pushing continuously a payload that contains:
"requestbody": "{\"kind\":\"FileStorage\",\"location\":\"westeurope\",\"name\":\"antoniotesta1b\",\"properties\":{\"allowBlobPublicAccess\":false,\"encryption\":{\"keySource\":\"Microsoft.Storage\",\"services\":{\"file\":{\"enabled\":true,\"keyType\":\"Account\"}}}},\"sku\":{\"name\":\"Premium_LRS\"},\"tags\":\"******\"}",
Strange enough, I wonder why the payload contains such tags with stars here.
I think the ****'s is Azure's logging redacting what could be sensitive tags. That's not from ASO.
Every hour (default reconciliation interval) ASO reconciler performs again a PUT to ARM.
This is the expected behavior based on reconciliation interval today. You can turn it off, if you want - although it means that changes to the Azure objects won't be "fixed" by ASO without you triggering another reconciliation of the object. See AZURE_SYNC_PERIOD.
Note that, even if we performed diffing, we would still need to periodically query Azure to get the current state and diff it with the expected state - the only difference is which request we send, a PUT or a GET (followed by an optional PUT if there was a diff).
We spoke about this and we want to write an ADR and explore the various scenarios. Things we discussed:
- How to avoid reconciling when mutable readonly fields (think: age or cpuLoad, etc) change?
- How to avoid losing our info on pod restart
Ideas we brainstormed:
- New CRD type that stores last update details?
- Store last seen in status field? new OperatorStatus section?
- Store last seen in annotation?
- Hashing?
- in-memory cache
- Combo of hashing and in-memory cache (hash is persistent, in-mem cache is obviously not).