azure-service-operator
azure-service-operator copied to clipboard
Bug: ASO should not try to delete undeletable resources
Version of Azure Service Operator
v2.7.0 (and earlier)
Describe the bug
Discovered while testing API version 2023-11-15 of documentdb - the type SqlDatabaseContainerThroughputSetting may not be deleted directly via ARM.
ASO knows this already, as we generate the correct list of supported operations in sql_database_container_throughput_setting_types_gen.go:
// GetSupportedOperations returns the operations supported by the resource
func (setting *SqlDatabaseContainerThroughputSetting) GetSupportedOperations() []genruntime.ResourceOperation {
return []genruntime.ResourceOperation{
genruntime.ResourceOperationGet,
genruntime.ResourceOperationPut,
}
}
However, we don't respect that when running, resulting in noisy errors:
test_logger.go:160: I2024-04-28T22:19:01Z] SqlDatabaseThroughputSettingController
"msg"="Error during Delete"
"err"="deleting resource "/subscriptions/82acd5bb-4206-47d4-9c12-a65db028483d/resourceGroups/asotest-rg-aokpof/providers/Microsoft.DocumentDB/databaseAccounts/sample-sqldb-account/sqlDatabases/sample-sql-db/throughputSettings/default": DELETE https://management.azure.com/subscriptions/82acd5bb-4206-47d4-9c12-a65db028483d/resourceGroups/asotest-rg-aokpof/providers/Microsoft.DocumentDB/databaseAccounts/sample-sqldb-account/sqlDatabases/sample-sql-db/throughputSettings/default
--------------------------------------------------------------------------------
RESPONSE 405: 405 Method Not Allowed
ERROR CODE: MethodNotAllowed
--------------------------------------------------------------------------------
{
"code": "MethodNotAllowed",
"message": "Message: The requested verb is not supported."
}
--------------------------------------------------------------------------------
" name="sample-sql-throughput"
namespace="aso-test-samples-creationanddeletion-test-sqldatabase-v1a-7e1fc"
azureName="default"
action="BeginDelete"
Expected behavior
We should respect the available verbs for the resource, and skip deletion from Azure where it's not possible, even though this may produce results surprising for some users.
Decision: Block direct deletion, returning an error for the condition indicating that direct deletion can't be done, must delete the parent. This avoids noisy error messages in the logs.
Two ways to approach - one is to block async, which makes delete a one-way operation. The other is to modify the webhook, which may have issues with cascading deletes from the parent.
Removed high priority from this given we haven't had many user complaints
Planning to move this to 2.10.0, but think we should get this fixed there as it's definitely awkward.
Decision: Block direct deletion, returning an error for the condition indicating that direct deletion can't be done, must delete the parent. This avoids noisy error messages in the logs.
This is still the decision. I think we have the infrastructure to do this now, because we know what HTTP verbs each resource supports.
We can check what HTTP verbs each resource supports via genruntime.ResourceOperationGet.IsSupportedBy(r.Obj). As discussed above there are two possible solutions here:
- If we block by webhook it will cause issues for cascading deletion, unless we make the webhook block more intelligent like "block delete unless parent delete is started (if it parent object is already gone that's OK too)"
- If we block on the async side, we'll have the behavior we mostly had today, except with maybe a more clear Condition reason and less spammy error logs (we could requeue with no error so no logging, or very minimal logging).
Given that users may submit a delete for all resources at once (including the child resource), there's (AFAIK) no great way to avoid rejecting the child delete in that case if we go the webhook route. I think that means we need to take the other path (block on async side).