[BUG] AI Foundry tests fail intermittently due to insufficient quota

Open vcolin7 opened this issue 1 month ago • 1 comments

Describe the bug

An uptick in live test runs has made the AI Foundry tests occasionally fail with a message like the following:

{
    "error": {
        "code": "InvalidTemplateDeployment",
        "message": "The template deployment 't367d90f162f247c3' is not valid according to the validation procedure. The tracking id is 'cbfe2fb8-32ec-4a1f-bf1d-300b6c8dc240'. See inner errors for details.",
        "details": [
            {
                "code": "InsufficientQuota ",
                "message": "This operation require 30 new capacity in quota Tokens Per Minute (thousands) - gpt-4o, which is bigger than the current available capacity 26. The current quota usage is 124 and the quota limit is 150 for quota Tokens Per Minute (thousands) - gpt-4o."
            }
        ]
    }
}

Taken from Azure Monitor. Produced by this CI run

@hallipr mentioned that this is likely caused due to AI Foundry resources being marked for soft-deletion by default. We should investigate a way of disabling this from the configuration to avoid running into insufficient quota issues in the future.

Expected behavior

Test should not fail due to insufficient resource quota.

Actual behavior

Test do fail because new resources cannot be deployed.

Reproduction Steps

Run the live tests for Azure.Mcp.Server (see here) multiple times in a short period. Maybe 10-ish in about 3-4 hours? I don't really know the quota limits imposed on our test subscription at this time.

Environment

No response

Nov 14 '25 20:11 vcolin7

It looks like our resource cleanup functions support some AI resource soft deletes: https://github.com/Microsoft/mcp/blob/main/eng/common/scripts/Helpers/Resource-Helpers.ps1#L43-L44

We may need to add additional discovery and delete logic if they don't fall under the resource query: /subscriptions/$subscriptionId/resourceGroups/$ResourceGroupName/providers/Microsoft.CognitiveServices/accounts?api-version=2024-10-01

Nov 14 '25 20:11 hallipr