[BUG] AI Foundry tests fail intermittently due to insufficient quota
Describe the bug
An uptick in live test runs has made the AI Foundry tests occasionally fail with a message like the following:
{
"error": {
"code": "InvalidTemplateDeployment",
"message": "The template deployment 't367d90f162f247c3' is not valid according to the validation procedure. The tracking id is 'cbfe2fb8-32ec-4a1f-bf1d-300b6c8dc240'. See inner errors for details.",
"details": [
{
"code": "InsufficientQuota ",
"message": "This operation require 30 new capacity in quota Tokens Per Minute (thousands) - gpt-4o, which is bigger than the current available capacity 26. The current quota usage is 124 and the quota limit is 150 for quota Tokens Per Minute (thousands) - gpt-4o."
}
]
}
}
Taken from Azure Monitor. Produced by this CI run
@hallipr mentioned that this is likely caused due to AI Foundry resources being marked for soft-deletion by default. We should investigate a way of disabling this from the configuration to avoid running into insufficient quota issues in the future.
Expected behavior
Test should not fail due to insufficient resource quota.
Actual behavior
Test do fail because new resources cannot be deployed.
Reproduction Steps
Run the live tests for Azure.Mcp.Server (see here) multiple times in a short period. Maybe 10-ish in about 3-4 hours? I don't really know the quota limits imposed on our test subscription at this time.
Environment
No response
It looks like our resource cleanup functions support some AI resource soft deletes: https://github.com/Microsoft/mcp/blob/main/eng/common/scripts/Helpers/Resource-Helpers.ps1#L43-L44
We may need to add additional discovery and delete logic if they don't fall under the resource query:
/subscriptions/$subscriptionId/resourceGroups/$ResourceGroupName/providers/Microsoft.CognitiveServices/accounts?api-version=2024-10-01