terraform-provider-azurerm
terraform-provider-azurerm copied to clipboard
404 Error when creating azurerm_shared_image: Can not perform requested operation on nested resource. Parent resource not found
Is there an existing issue for this?
- [X] I have searched the existing issues
Community Note
- Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform Version
1.0.11
AzureRM Provider Version
3.19.1
Affected Resource(s)/Data Source(s)
azurerm_shared_image
Terraform Configuration Files
resource "azurerm_shared_image_gallery" "sig" {
name = "gallery_${replace(var.cluster_id, "-", "_")}"
resource_group_name = data.azurerm_resource_group.main.name
location = var.azure_region
}
resource "azurerm_shared_image" "clustergen2" {
name = "${var.cluster_id}-gen2"
gallery_name = azurerm_shared_image_gallery.sig.name
resource_group_name = data.azurerm_resource_group.main.name
location = var.azure_region
os_type = "Linux"
hyper_v_generation = "V2"
architecture = var.azure_vm_architecture
identifier {
publisher = "RedHat-gen2"
offer = "rhcos-gen2"
sku = "gen2"
}
}
resource "azurerm_shared_image_version" "clustergen2_image_version" {
name = "0.0.1"
gallery_name = azurerm_shared_image.clustergen2.gallery_name
image_name = azurerm_shared_image.clustergen2.name
resource_group_name = azurerm_shared_image.clustergen2.resource_group_name
location = azurerm_shared_image.clustergen2.location
blob_uri = azurerm_storage_blob.rhcos_image.url
storage_account_id = azurerm_storage_account.cluster.id
target_region {
name = azurerm_shared_image.clustergen2.location
regional_replica_count = 1
}
}
Debug Output/Panic Output
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image_gallery.sig: Creation complete after 34s [id=/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-2h3vjhpr-2dc90-7tk4z-rg/providers/Microsoft.Compute/galleries/gallery_ci_op_2h3vjhpr_2dc90_7tk4z]"
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.clustergen2: Creating..."
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.cluster: Creating..."
time="2022-09-06T18:08:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [10s elapsed]"
time="2022-09-06T18:09:04Z" level=debug msg="azurerm_role_assignment.main: Still creating... [40s elapsed]"
time="2022-09-06T18:09:09Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [20s elapsed]"
time="2022-09-06T18:09:14Z" level=debug msg="azurerm_role_assignment.main: Still creating... [50s elapsed]"
time="2022-09-06T18:09:16Z" level=debug msg="azurerm_role_assignment.main: Creation complete after 51s [id=/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-2h3vjhpr-2dc90-7tk4z-rg/providers/Microsoft.Authorization/roleAssignments/67c2a5ea-9264-0748-7798-6396c6036fec]"
time="2022-09-06T18:09:19Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [30s elapsed]"
time="2022-09-06T18:09:29Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [40s elapsed]"
time="2022-09-06T18:09:39Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [50s elapsed]"
time="2022-09-06T18:09:49Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m0s elapsed]"
time="2022-09-06T18:09:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m10s elapsed]"
time="2022-09-06T18:10:09Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m20s elapsed]"
time="2022-09-06T18:10:19Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m30s elapsed]"
time="2022-09-06T18:10:29Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m40s elapsed]"
time="2022-09-06T18:10:39Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m50s elapsed]"
time="2022-09-06T18:10:49Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m0s elapsed]"
time="2022-09-06T18:10:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m10s elapsed]"
time="2022-09-06T18:11:09Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m20s elapsed]"
time="2022-09-06T18:11:19Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m30s elapsed]"
time="2022-09-06T18:11:29Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m40s elapsed]"
time="2022-09-06T18:11:39Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m50s elapsed]"
time="2022-09-06T18:11:49Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [3m0s elapsed]"
time="2022-09-06T18:11:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [3m10s elapsed]"
time="2022-09-06T18:12:04Z" level=debug msg="azurerm_storage_blob.rhcos_image: Creation complete after 3m16s [id=https://clusteromnyi.blob.core.windows.net/vhd/rhcosomnyi.vhd]"
time="2022-09-06T18:12:04Z" level=error
time="2022-09-06T18:12:04Z" level=error msg="Error: creating/updating Shared Image: (Image Name \"ci-op-2h3vjhpr-2dc90-7tk4z\" / Gallery Name \"gallery_ci_op_2h3vjhpr_2dc90_7tk4z\" / Resource Group \"ci-op-2h3vjhpr-2dc90-7tk4z-rg\"): compute.GalleryImagesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code=\"ParentResourceNotFound\" Message=\"Can not perform requested operation on nested resource. Parent resource 'gallery_ci_op_2h3vjhpr_2dc90_7tk4z' not found.\""
time="2022-09-06T18:12:04Z" level=error
time="2022-09-06T18:12:04Z" level=error msg=" with azurerm_shared_image.cluster,"
time="2022-09-06T18:12:04Z" level=error msg=" on main.tf line 104, in resource \"azurerm_shared_image\" \"cluster\":"
time="2022-09-06T18:12:04Z" level=error msg=" 104: resource \"azurerm_shared_image\" \"cluster\" {"
time="2022-09-06T18:12:04Z" level=error
time="2022-09-06T18:12:04Z" level=error
Expected Behaviour
shared image should be created successfully in the image gallery
Actual Behaviour
time="2022-09-06T18:12:04Z" level=error msg="Error: creating/updating Shared Image: (Image Name "ci-op-2h3vjhpr-2dc90-7tk4z" / Gallery Name "gallery_ci_op_2h3vjhpr_2dc90_7tk4z" / Resource Group "ci-op-2h3vjhpr-2dc90-7tk4z-rg"): compute.GalleryImagesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="ParentResourceNotFound" Message="Can not perform requested operation on nested resource. Parent resource 'gallery_ci_op_2h3vjhpr_2dc90_7tk4z' not found.""
Steps to Reproduce
No response
Important Factoids
No response
References
Hit this issue when i opened a PR to use image galleries in openshift. The e2e job for azure that we have hit this error and the discussion is here: https://github.com/openshift/installer/pull/6304#issuecomment-1238759075
hi @Prashanth684 thanks for opening the issue. The dependency is auto generated by the statement gallery_name = azurerm_shared_image_gallery.sig.name
so no depends_on
is required in azurerm_shared_image.clustergen2
. However there seems to be a caching issue that even the gallery is created successfully, it could not be found. I've tested a few times and unfortunately not able to repro it locally, is this issue consistently occurring in your environment? I'll try few more times with it and we may need to and an retry in the provider based on the 404 response when creating the image
hi @Prashanth684 thanks for opening the issue. The dependency is auto generated by the statement
gallery_name = azurerm_shared_image_gallery.sig.name
so nodepends_on
is required inazurerm_shared_image.clustergen2
. However there seems to be a caching issue that even the gallery is created successfully, it could not be found. I've tested a few times and unfortunately not able to repro it locally, is this issue consistently occurring in your environment? I'll try few more times with it and we may need to and an retry in the provider based on the 404 response when creating the image
The issue is not consistent and is only happening in our CI environment. The CI environment has many clusters created at the same instance and might be heavily loaded at times. I am not sure if that could contribute to this. But , i was not able to reproduce it on manual testing as well.
@grayzu - Is this something the team can look into ? @Prashanth684 - Does it happen with a specific version (or) latest version? Have you tried using another version of terraform.
@grayzu - Is this something the team can look into ? @Prashanth684 - Does it happen with a specific version (or) latest version? Have you tried using another version of terraform.
you mean the azurerm provider? if so i am using 3.19.1
also sometimes the error is seen when creating the image version where it could not find the image definition:
level=error msg=Error: creating Shared Image Version: (Version Name "412.86.20220810" / Image Name "ci-op-7l9zv3g1-2a01b-g4vfv" / Gallery Name "gallery_ci_op_7l9zv3g1_2a01b_g4vfv" / Resource Group "ci-op-7l9zv3g1-2a01b-exrg"): compute.GalleryImageVersionsClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="ParentResourceNotFound" Message="Can not perform requested operation on nested resource. Parent resource 'gallery_ci_op_7l9zv3g1_2a01b_g4vfv/ci-op-7l9zv3g1-2a01b-g4vfv' not found."
level=error
level=error msg= with azurerm_shared_image_version.cluster_image_version,
level=error msg= on main.tf line 138, in resource "azurerm_shared_image_version" "cluster_image_version":
level=error msg= 138: resource "azurerm_shared_image_version" "cluster_image_version" {
level=error
any updates on this? this issue is becoming a blocker for our enablement of arm64 vm provisioning on Azure.
I found even with the 30-second sleep in the updated pr, the image creation still has issue with finding the gallery, though gallery detail could be retrieved when setting the properties in azurem_shared_image_gallery
after its creation. I would suggest open a support at Azure to check whether there is any cache issue within the subscription.
Meanwhile, since we are seeing this issue, I think we would need to apply a fix to azurerm_shared_image
and azurerm_shared_image_version
similar to below code, we will need to add retry on 404 http status code in the provider
https://github.com/hashicorp/terraform-provider-azurerm/blob/de964e86cb80ca3a31246297eaa9a7b51219c587/internal/services/apimanagement/api_management_subscription_resource.go#L193-L198
i just tried adding the retry locally for testing and i'm seeing that with this retry, we need something like a depends_on
to make sure that the resources get created? i have a module trying to create a VM and it throws an error with the retry because the resource has not been created yet.
@myc2h6o based on your comment above i tried using the retry mechanism in the provider: https://github.com/openshift/installer/pull/6304/commits/56e2a1c0db552d909e12c689953ef881d39042f3 but i still had to add a timed wait because it looks like the function asynchronously returns immediately and the image version is not created yet. Although the results at first glance seem positive (i don't see 404 error on the azure jobs), maybe there is a way to refine it ?
Hi @Prashanth684 I've created #18496 to retry the Create when 404 happens. It will retry until the resource operation timeout.
As for the depends_on
, if resource A references a property in resource B, the dependency will be generated automatically and doesn't require an explicit depends_on
, like below case:
resource "azurerm_shared_image" "clustergen2" {
gallery_name = azurerm_shared_image_gallery.sig.name
...
}
And for a temporary workaround for adding time_sleep.wait_30_seconds
, this requires depends_on
in the time_sleep
as well as the resources that need to wait for 30 seconds before creation.
For module dependency, if module A depends on module B, B needs to have an output variable, and A needs to reference that variable. More details could be found at https://stackoverflow.com/a/58277124
Hi @Prashanth684 I've created #18496 to retry the Create when 404 happens. It will retry until the resource operation timeout. As for the
depends_on
, if resource A references a property in resource B, the dependency will be generated automatically and doesn't require an explicitdepends_on
, like below case:resource "azurerm_shared_image" "clustergen2" { gallery_name = azurerm_shared_image_gallery.sig.name ... }
And for a temporary workaround for adding
time_sleep.wait_30_seconds
, this requiresdepends_on
in thetime_sleep
as well as the resources that need to wait for 30 seconds before creation. For module dependency, if module A depends on module B, B needs to have an output variable, and A needs to reference that variable. More details could be found at https://stackoverflow.com/a/58277124
thanks ! i was able to test out the PR that you put up (as part of my PR) and i did not see the 404 errors in the azure jobs although there was an image creation timeout which i am not sure is related.
Hi @Prashanth684 Since I closed the pr, let's put the further discussion at this issue.
For enabling the debug log https://www.terraform.io/internals/debugging, the previous log you shared seems not setting the environment TF_LOG
to "DEUBG"
. After enabling the debug log, you would be able to see something like below, with the request URI and Json payload, which would help with the troubleshooting. And the X-Ms-Correlation-Request-Id
could be used to identify the request on Azure end as well.
PUT /subscriptions/***/resourcegroups/test_rg?api-version=2020-06-01 HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.18.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 resources/2020-06-01 HashiCorp Terraform/1.2.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
Content-Length: 31
Content-Type: application/json; charset=utf-8
X-Ms-Authorization-Auxiliary:
X-Ms-Correlation-Request-Id: 70ecaa60-adff-1ebf-8702-562b1430a19f
Accept-Encoding: gzip
{"location":"eastus","tags":{}}: timestamp=2022-09-26T12:34:07.479+0800
Hi @Prashanth684 Since I closed the pr, let's put the further discussion at this issue. For enabling the debug log https://www.terraform.io/internals/debugging, the previous log you shared seems not setting the environment
TF_LOG
to"DEUBG"
. After enabling the debug log, you would be able to see something like below, with the request URI and Json payload, which would help with the troubleshooting. And theX-Ms-Correlation-Request-Id
could be used to identify the request on Azure end as well.PUT /subscriptions/***/resourcegroups/test_rg?api-version=2020-06-01 HTTP/1.1 Host: management.azure.com User-Agent: Go/go1.18.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 resources/2020-06-01 HashiCorp Terraform/1.2.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820 Content-Length: 31 Content-Type: application/json; charset=utf-8 X-Ms-Authorization-Auxiliary: X-Ms-Correlation-Request-Id: 70ecaa60-adff-1ebf-8702-562b1430a19f Accept-Encoding: gzip {"location":"eastus","tags":{}}: timestamp=2022-09-26T12:34:07.479+0800
Sorry, that was my bad. We enabled debug logs, but unfortunately have not reproduced this issue yet with debug logs enabled. I will let you know when we do.
reproduced the issue with debug logs here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/6304/pull-ci-openshift-installer-master-e2e-azure-ovn/1574756406718894080/artifacts/e2e-azure-ovn/ipi-install-install/artifacts/.openshift_install.log
@myc2h6o do the logs help?
Hi @Prashanth684 the logs you shared shall help with the troubleshooting. I've shared them internally with the service team. Meanwhile, I would suggest you also open a support ticket at Azure for this subscription with sharing this log as well to get an official support.
@myc2h6o Note that are 2 additional issues for similar problems.
- https://github.com/hashicorp/terraform-provider-azurerm/issues/18350
- https://github.com/hashicorp/terraform-provider-azurerm/issues/18311
We also know for sure the problem started during the day on 9/1.
Have you heard back from the service team? We have had a ticket open for several weeks with little progress from the support side.
It seems like the Azure ARM API errors have been addressed on the service side, at least in the regions in which we operate. This can likely be closed though it'd be nice to have assurances that those defects won't re-appear in the future.
This issue is still occurring for us when deploying Azure SQL to the East region.
Hi @jonathaneckman, the previous issue seems to be resolved on service side so yours may be different than that. I'd suggest you open a new issue to describe your scenario and share the Terraform config you are using. Meanwhile, there could also be some issue at Azure side, so I'd recommend you open a support ticket to Azure as well.
Thanks for taking the time to open this issue. It appears the original issue was resolved on the service side, as such I am going to mark this issue as closed. If there is still an issue on the latest version of the provider please do let us know by opening a new issue, thanks!
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.