pulumi-azure-native
pulumi-azure-native copied to clipboard
Refreshing project involving machine learning services managed network settings causes resources to be replaced
What happened?
When creating a MachineLearningService.Workspace
it's possible to create the managednetwork outbound rules as a separate resource (and apparently customers are encouraged to do so by Microsoft). However, when you do this and run pulumi refresh
it updates the workspace resource with the values set in the network settings resource and then deletes the network settings rule resource.
Then when the update happens, Pulumi recreates the settingsrule resource (because it no longer exists in Azure) and clears out the managed settings rule in the workspace and then the same thing happens all over again.
Example
Code can be found here: https://github.com/pulumi/customer-engineering/tree/master/customer-support/5019 (private repo)
Output of pulumi about
CLI
Version 3.113.0
Go Version go1.22.2
Go Compiler gc
Plugins
NAME VERSION
azure-native 2.27.0
dotnet unknown
Host
OS darwin
Version 14.4.1
Arch arm64
This project is written in dotnet: executable='/usr/local/share/dotnet/dotnet' version='8.0.201'
Current Stack: pierskarsenbarg/azure-ml-csharp/dev
TYPE URN
pulumi:pulumi:Stack urn:pulumi:dev::azure-ml-csharp::pulumi:pulumi:Stack::azure-ml-csharp-dev
pulumi:providers:azure-native urn:pulumi:dev::azure-ml-csharp::pulumi:providers:azure-native::default_2_27_0
azure-native:resources:ResourceGroup urn:pulumi:dev::azure-ml-csharp::azure-native:resources:ResourceGroup::pl-cs-ml-rg
azure-native:storage:StorageAccount urn:pulumi:dev::azure-ml-csharp::azure-native:storage:StorageAccount::mlsa
azure-native:operationalinsights:Workspace urn:pulumi:dev::azure-ml-csharp::azure-native:operationalinsights:Workspace::appInsights
azure-native:insights:Component urn:pulumi:dev::azure-ml-csharp::azure-native:insights:Component::component
azure-native:machinelearningservices/v20231001:Workspace urn:pulumi:dev::azure-ml-csharp::azure-native:machinelearningservices/v20231001:Workspace::mlworkspace
Found no pending operations associated with dev
Backend
Name pulumi.com
URL https://app.pulumi.com/pierskarsenbarg
User pierskarsenbarg
Organizations pierskarsenbarg, karsenbarg, team-ce, demo
Token type personal
Dependencies:
NAME VERSION
Pulumi 3.61.0
Pulumi.AzureNative 2.27.0
Pulumi locates its logs in /var/folders/x8/cdd9j87s607fwpy0q62mfmmw0000gn/T/ by default
Additional context
If you look at state export that I've taken before and after the refresh, you can see here that the outboundRules
input is empty and we have networksettingsrule
resource here: https://github.com/pulumi/customer-engineering/blob/60d8ce2e2835bab670bce807e276934186ebb1fd/customer-support/5019/stack-before-refresh.json#L421-L479
but after the refresh the ML workspace now has the network settings in the outboundRule
section: https://github.com/pulumi/customer-engineering/blob/60d8ce2e2835bab670bce807e276934186ebb1fd/customer-support/5019/stack-after-refresh.json#L324-L334 and the setting rule resource is gone.
I've also added https://github.com/pulumi/customer-engineering/blob/master/customer-support/5019/out.txt which are verbose logs and if you search for HTTP/2.0 404 Not Found
you can see that it's the rules that can't be found.
Contributing
Vote on this issue by adding a π reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
This looks like a similar problem to https://github.com/pulumi/pulumi-azure-native/issues/611 and https://github.com/pulumi/pulumi-azure-native/issues/1112 - where certain sub-property collections are also represented as standalone resources.
One oddity here is that the standalone resource gets removed on refresh. We'd need to identify why this isn't returned correctly by the API as there might be something else going on. Otherwise, the existing mechanism should be sufficient to handle maintaining the existing property transparently if it wasn't originally set.
It looks like there's an issue with this resource's GET function. Running the repro myself showed this warning in the diagnostics:
Updating (dev)
View in Browser (Ctrl+O): https://app.pulumi.com/daniel-pulumi-corp/sdf/dev/updates/43
Type Name Status Info
pulumi:pulumi:Stack sdf-dev
+ ββ azure-native:resources:ResourceGroup pl-cs-ml-rg created (0.66s)
+ ββ azure-native:storage:StorageAccount mlsa created (23s)
+ ββ azure-native:operationalinsights:Workspace appInsights created (31s)
+ ββ azure-native:keyvault:Vault vault created (34s)
+ ββ azure-native:insights:Component component created (1s)
+ ββ azure-native:machinelearningservices/v20231001:Workspace mlworkspace created (33s)
+ ββ azure-native:machinelearningservices:ManagedNetworkSettingsRule mlsrule created (32s) 1 warning
Diagnostics:
azure-native:machinelearningservices:ManagedNetworkSettingsRule (mlsrule):
warning: Failed to read resource after Create. Please report this issue.
Verbose logs contain more information, see https://www.pulumi.com/docs/support/troubleshooting/#verbose-logging.
Here's the repro, modified to include the missing Vault
using Pulumi;
using Pulumi.AzureNative.Resources;
using Pulumi.AzureNative.Storage;
using OperationalInsights = Pulumi.AzureNative.OperationalInsights;
using Pulumi.AzureNative.OperationalInsights.Inputs;
using Insights = Pulumi.AzureNative.Insights;
using Pulumi.AzureNative.KeyVault;
using MLS = Pulumi.AzureNative.MachineLearningServices;
using Pulumi.AzureNative.MachineLearningServices.Inputs;
using System;
return await Pulumi.Deployment.RunAsync(() =>
{
var resourceGroup = new ResourceGroup("pl-cs-ml-rg", new ResourceGroupArgs
{
Location = "uksouth"
});
var storageAccount = new StorageAccount("mlsa", new Pulumi.AzureNative.Storage.StorageAccountArgs
{
ResourceGroupName = resourceGroup.Name,
Sku = new Pulumi.AzureNative.Storage.Inputs.SkuArgs
{
Name = Pulumi.AzureNative.Storage.SkuName.Standard_LRS
},
Kind = Kind.StorageV2
});
var appInsights = new OperationalInsights.Workspace("appInsights", new()
{
Features = new WorkspaceFeaturesArgs
{
EnableLogAccessUsingOnlyResourcePermissions = true
},
PublicNetworkAccessForIngestion = OperationalInsights.PublicNetworkAccessType.Enabled,
PublicNetworkAccessForQuery = OperationalInsights.PublicNetworkAccessType.Enabled,
ResourceGroupName = resourceGroup.Name,
RetentionInDays = 30,
Sku = new WorkspaceSkuArgs
{
Name = OperationalInsights.WorkspaceSkuNameEnum.PerGB2018
},
WorkspaceCapping = new WorkspaceCappingArgs
{
DailyQuotaGb = -1
},
WorkspaceName = "pkmlcsopworkspace"
});
var insightsComponent = new Insights.Component("component", new()
{
ApplicationType = Insights.ApplicationType.Web,
FlowType = "Redfield",
IngestionMode = Insights.IngestionMode.LogAnalytics,
Kind = "web",
PublicNetworkAccessForIngestion = Insights.PublicNetworkAccessType.Enabled,
PublicNetworkAccessForQuery = Insights.PublicNetworkAccessType.Enabled,
RequestSource = "IbizaMachineLearningExtension",
ResourceGroupName = resourceGroup.Name,
ResourceName = "pkmlcsinworkspace",
RetentionInDays = 90,
WorkspaceResourceId = appInsights.Id
});
var vault = new Vault("vault", new VaultArgs
{
ResourceGroupName = resourceGroup.Name,
Properties = new Pulumi.AzureNative.KeyVault.Inputs.VaultPropertiesArgs
{
TenantId = "xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx",
Sku = new Pulumi.AzureNative.KeyVault.Inputs.SkuArgs
{
Family = "A",
Name = Pulumi.AzureNative.KeyVault.SkuName.Standard,
},
}
});
var mlWorkspace = new MLS.V20231001.Workspace("mlworkspace", new()
{
ApplicationInsights = insightsComponent.Id,
FriendlyName = "pk-ml-cs-workspace",
Description = "",
DiscoveryUrl = "https://uksouth.api.azureml.ms/discovery",
HbiWorkspace = false,
Identity = new MLS.V20231001.Inputs.ManagedServiceIdentityArgs
{
Type = MLS.V20231001.ManagedServiceIdentityType.SystemAssigned
},
KeyVault = vault.Id,
ManagedNetwork = new MLS.V20231001.Inputs.ManagedNetworkSettingsArgs
{
IsolationMode = "AllowInternetOutbound",
OutboundRules = new InputMap<object>()
},
PublicNetworkAccess = MLS.V20231001.PublicNetworkAccess.Enabled,
ResourceGroupName = resourceGroup.Name,
Sku = new MLS.V20231001.Inputs.SkuArgs
{
Name = "Basic",
Tier = MLS.V20231001.SkuTier.Basic
},
StorageAccount = storageAccount.Id,
WorkspaceName = "pk-ml-cs-workspace"
});
var rule = new MLS.ManagedNetworkSettingsRule("mlsrule", new()
{
ResourceGroupName = resourceGroup.Name,
WorkspaceName = mlWorkspace.Name,
Properties = new PrivateEndpointOutboundRuleArgs
{
Destination = new PrivateEndpointDestinationArgs
{
SparkEnabled = false,
ServiceResourceId = storageAccount.Id,
SubresourceTarget = "blob"
},
Category = MLS.RuleCategory.UserDefined,
Type = "PrivateEndpoint"
},
});
});
Running this with verbose logs shows that it's attempting to perform the GET against https://management.azure.com/pk-ml-cs-workspace-2/outboundRules/mlsrule?api-version=2023-10-01
.
According to the specification, this endpoint should be /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/{workspaceName}/outboundRules/{ruleName}
.
Running this under a debugger shows that the URL is being constructed from the resource's id
field, which for all other resources starts with /subscriptions/0282681f-7a9e-424b-80b2-96babd57a8a1/resourceGroups/pl-cs-ml-rg20c310d8
etc, but this resource is starting with just the id of the ml workspace.
Side note: I don't seem able to delete this rule resource within the portal either - nothing happens when I click the delete button:
No you have to delete the ml workspace to delete the rule in the portal.
When performing the create, we calculate the id
based on the specification and send the PUT
request correctly.
The Azure response contains an id
field which, if present, we take to be the authoritive source of the canonical Azure identifier for the resource:
https://github.com/pulumi/pulumi-azure-native/blob/d422a4264e04244bd42a188b798e4c72fec7fa73/provider/pkg/provider/provider.go#L909-L911
However, on this API, Azure is returning an identifier which is missing the subscription and resource group context:
Original ID: "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/pl-cs-ml-rgxxxxxx/providers/Microsoft.MachineLearningServices/workspaces/pk-ml-cs-workspace-2/outboundRules/mlsrule"
Azure's ID: "pk-ml-cs-workspace-2/outboundRules/mlsrule"
This incorrect identifier is then used for subsequent requests, which then fail.
It knows itβs attached to the workspace because it attaches the rule to the workspace which is weird. I thought it was like that vnet/subnet issue from before
Manually fixing the id
field in the state (exporting, editing then importing) the performing a refresh shows only the diff of the Workspace resource:
Type Name Plan Info
pulumi:pulumi:Stack sdf-dev
ββ azure-native:resources:ResourceGroup pl-cs-ml-rg
ββ azure-native:machinelearningservices/v20231001:ManagedNetworkSettingsRule mlsrule
ββ azure-native:keyvault:Vault vault
~ ββ azure-native:machinelearningservices/v20231001:Workspace mlworkspace update [diff: ~managedNetwork,systemData]
ββ azure-native:storage:StorageAccount mlsa
ββ azure-native:operationalinsights:Workspace appInsights
ββ azure-native:insights:Component component
Resources:
~ 1 to update
7 unchanged
Therefore there's two separate issues at play here:
- Azure's API is broken and returning the wrong
id
- causing subsequent operations to the rule to fail. - We need to introduce the parent-child awareness to the Workspace to ensure it doesn't overwrite when all rules are managed externally (like VNets).
As it stands, I would advise against using the standalone rule resource until it's fixed by Azure as it's not currently workable given the broken API. At that point, it would be worth considering the enhancement to make it easier to manage rules just as external resources similar to VNets/Subnets.
Yeah Iβve already advised them not to but they were only doing it because MS told them to.
I donβt know how keen they are to start pushing MS to fix it but Iβll find out
Ok, marking as blocked as there's no reason to start work on the ability to use standalone rule resources until we can successfully rules as resources without errors from the API.
@danielrbradley which API endpoint is returning the wrong id? We are engaging with MS on this from our side.
@Werner-Swart-83 the incorrect ID is being returned after awaiting the result of the PUT /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/{workspaceName}/outboundRules/{ruleName}
creates a response which includes an id
field with the value {workspaceName}/outboundRules/{ruleName}
.
e.g. PUT /subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/pl-cs-ml-rgxxxxxx/providers/Microsoft.MachineLearningServices/workspaces/pk-ml-cs-workspace-2/outboundRules/mlsrule
Responds with { "id": "pk-ml-cs-workspace-2/outboundRules/mlsrule", ... }
@danielrbradley MS has confirmed that the fix for that endpoint has been applied to WestEurope.
That's great! Thank you for the update @Werner-Swart-83
@danielrbradley Is this issue unblocked then?
@danielrbradley Is this issue unblocked then?
This might be entirely fixed now by Microsoft's change. We should re-test and check if it still has an issue around being managed as part of a parent resource or if that was an incorrect assessment in the early comments.
If we can confirm it's fixed and the repro in https://github.com/pulumi/pulumi-azure-native/issues/3225#issuecomment-2066546278 now passes, then we can close out this ticket.
Great! Marked this as needs-triage
to re-test it.
On re-testing now, I see the following refresh:
Type Name Plan Info
pulumi:pulumi:Stack sdf-dev
~ ββ azure-native:machinelearningservices/v20231001:Workspace mlworkspace update [diff: ~managedNetwork,systemData]
ββ azure-native:resources:ResourceGroup pl-cs-ml-rg
ββ azure-native:machinelearningservices/v20231001:ManagedNetworkSettingsRule mlsrule
ββ azure-native:operationalinsights:Workspace appInsights
ββ azure-native:insights:Component component
ββ azure-native:storage:StorageAccount mlsa
ββ azure-native:keyvault:Vault vault
Resources:
~ 1 to update
7 unchanged
The deletion of the ManagedNetworkSettingsRule
no longer appears, so I'll close this as complete.