packer-plugin-azure icon indicating copy to clipboard operation
packer-plugin-azure copied to clipboard

Azure container application using managed Identities

Open joeybdub opened this issue 1 year ago • 6 comments

Overview of the Issue

Azure container application use a different URL to retrieve the managed identity than currently in the code.

A paragraph or two about the issue you're experiencing.

When running packer on azure container applications it will always fail because of the code: https://github.com/hashicorp/packer-plugin-azure/blob/516ae5e5d0caead4579f8216fd0f4bba80e1e39b/builder/azure/common/client/config.go#L257 and https://github.com/hashicorp/packer-plugin-azure/blob/516ae5e5d0caead4579f8216fd0f4bba80e1e39b/builder/azure/common/client/config_retriever.go#L18C11-L18C12 seem to have the default URL "http://169.254.169.254/metadata/instance/compute" This is not reachable and on azure container apps and possible other Azure container resources. Azure container application use: MSI_ENDPOINT=http://localhost:42356/msi/token to retrieve the Managed identifies. Can there be an if statement or check added if using Azure container application to use the alternative URL above?

Reproduction Steps

Steps to reproduce this issue: Run packer build on a azure container application e..g. packer build -var "image_version=XXXXXX" -var "client_id=XXXXXXXXX" -var "subscription_id=XXXXXXXXXX" -var "resource_group_location=XXXXXX" -var "location=XXXXXXX" -var "os_type=Windows" -var "image_publisher=XXXXXXXXX" -var "image_offer=XXXXXXXX" -var "image_sku=XXXXXX" -var "vm_size=XXXXXXXX" -var "gallery_name=XXXXX" -var "image_version=XXXXXX" -var "image_name=XXXXXXXXX" -var "resource_group_name=XXXXXXXX" -var-file "./variables.pkrvars.hcl" .

Plugin and Packer version

plugin version: 2.1.2, packer version: 1.110.3

From packer version

Simplified Packer Buildfile

build { sources = [ "source.azure-arm.blah" ]

provisioner "windows-update" { search_criteria = "IsInstalled=0" }

source "azure-arm" "xxxxxx" { image_offer = var.image_offer image_publisher = var.image_publisher image_sku = var.image_sku location = var.location managed_image_name = var.managed_image_name managed_image_resource_group_name = var.managed_image_resource_group_name os_type = var.os_type vm_size = var.vm_size subscription_id = var.subscription_id managed_image_storage_account_type = var.managed_image_storage_account_type private_virtual_network_with_public_ip = var.private_virtual_network_with_public_ip shared_gallery_image_version_exclude_from_latest = var.shared_gallery_image_version_exclude_from_latest

shared_image_gallery_destination { gallery_name = var.gallery_name image_name = var.image_name image_version = var.image_version replication_regions = var.replication_regions resource_group = var.resource_group_name subscription = var.subscription_id storage_account_type = var.storage_account_type } communicator = "winrm" winrm_use_ssl = true winrm_insecure= true winrm_timeout= "20m" winrm_username= "packer" #async_resourcegroup_delete": "true", } }

Log Fragments and crash.log files

Running builder ... Build 'azure-arm.xxxxxx' errored after 30 seconds 1 millisecond: error fetching subscriptionID from VM metadata service for Managed Identity authentication: Get "http://169.254.169.254/metadata/instance/compute?api-version=2017-08-01&format=json": dial tcp 169.254.169.254:80: i/o timeout

joeybdub avatar Apr 25 '24 08:04 joeybdub

@joeybdub can you try setting your subscription_id in the template itself, if you do so as you can see from the linked code you will not fetch it from the VMs IMDS service, I am not sure I would classify this as a bug since we have not previously supported this type of build, we do not test the plugin currently against Azure Container Applications so I can't assure you this will work, but it'll get you past that error.

JenGoldstrich avatar Apr 25 '24 16:04 JenGoldstrich

@JenGoldstrich I do have the subscription_id in template as per below? Unless there is another location?

source "azure-arm" "pollinatebuild" { image_offer = var.image_offer image_publisher = var.image_publisher image_sku = var.image_sku location = var.location managed_image_name = var.managed_image_name managed_image_resource_group_name = var.managed_image_resource_group_name

subscription_id = var.subscription_id

build_resource_group_name = var.build_resource_group_name os_type = var.os_type vm_size = var.vm_size managed_image_storage_account_type = var.managed_image_storage_account_type private_virtual_network_with_public_ip = var.private_virtual_network_with_public_ip shared_gallery_image_version_exclude_from_latest = var.shared_gallery_image_version_exclude_from_latest

shared_image_gallery_destination { gallery_name = var.gallery_name image_name = var.image_name image_version = var.image_version replication_regions = var.replication_regions resource_group = var.resource_group_name subscription = var.subscription_id storage_account_type = var.storage_account_type } communicator = "winrm" winrm_use_ssl = true winrm_insecure= true winrm_timeout= "20m" winrm_username= "packer" #async_resourcegroup_delete": "true", }

joeybdub avatar May 01 '24 09:05 joeybdub

Hello,

We are also encountering the same issue when running packer on ACA. We are passing the subscription_id and the client_id in the template. It's bypassing the error above. However, it's then calling the go-azure-sdk and the following error is returned

2024/05/08 09:20:04 packer-plugin-azure_v2.1.3_x5.0_linux_amd64 plugin: 2024/05/08 09:20:04 [ERR] GET http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=xxx&resource=https%3A%2F%2Fmanagement.azure.com request failed: Get "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=xxx&resource=https%3A%2F%2Fmanagement.azure.com": context canceled

It should be calling the endpoint defined as en environment variables on ACA and not 169.254.169.254

IDENTITY_ENDPOINT=http://localhost:42356/msi/token
MSI_ENDPOINT=http://localhost:42356/msi/token

The azure SDK is called by packer here: https://github.com/hashicorp/packer-plugin-azure/blob/5fb2e3e0105a1bafc69c038f2d64eee42b968d07/builder/azure/common/client/azure_authorizer.go#L85

There is an option in the SDK to specify a CustomManagedIdentityEndpoint. Maybe packer can add some config in the auth.Credentials struct if the environment variables are defined? I would have expected the azure SDK to do it though.

What do you think?

https://github.com/hashicorp/go-azure-sdk/blob/main/sdk/auth/managed_identity_authorizer.go#L140C5-L140C35

perpective2410 avatar May 08 '24 10:05 perpective2410

Hey @florentvaldelievre the SDK does not currently test against or support Azure Container Apps, I have clarified this with the core SDK maintainers. The AzureRM Terraform provider which uses the same SDK also does not support container apps regardless of there ability to set the MSI endpoint, while we should be setting the MSI endpoint, adding the ability to do so would not make the SDK work with the different type of MSI authentication, it is slightly different than how VMs work from what I understand and would require changes to the SDK to support which are not currently planned

I would definitely welcome a contribution to allow changing the MSI endpoint, but I do believe that would just get you another error

Here's a relatively current issue showing a TF Provider user unable to connect to ACA https://github.com/hashicorp/terraform-provider-azurerm/issues/25726 even when able to set their MSI url

JenGoldstrich avatar May 08 '24 14:05 JenGoldstrich

thanks for the update @JenGoldstrich and I hope you enjoy your villa in the Bahamas after the IBM buy out!! :-)

joeybdub avatar May 08 '24 15:05 joeybdub

Is this still the answer for container apps using managed identities? I'm seeing this and after many google searches came upon this issue. I'm seeing ...

==> azure-arm.windows_image: Running builder ...
Build 'azure-arm.windows_image' errored after 2 milliseconds 230 microseconds: error fetching subscriptionID from VM metadata service for Managed Identity authentication: invalid character 'p' after top-level value

jwshive avatar Apr 17 '25 15:04 jwshive