terraform-provider-azurerm icon indicating copy to clipboard operation
terraform-provider-azurerm copied to clipboard

404 Error when creating azurerm_shared_image: Can not perform requested operation on nested resource. Parent resource not found

Open Prashanth684 opened this issue 2 years ago • 11 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Community Note

  • Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.0.11

AzureRM Provider Version

3.19.1

Affected Resource(s)/Data Source(s)

azurerm_shared_image

Terraform Configuration Files

resource "azurerm_shared_image_gallery" "sig" {
  name                = "gallery_${replace(var.cluster_id, "-", "_")}"
  resource_group_name = data.azurerm_resource_group.main.name
  location            = var.azure_region
}

resource "azurerm_shared_image" "clustergen2" {
  name                = "${var.cluster_id}-gen2"
  gallery_name        = azurerm_shared_image_gallery.sig.name
  resource_group_name = data.azurerm_resource_group.main.name
  location            = var.azure_region
  os_type             = "Linux"
  hyper_v_generation  = "V2"
  architecture        = var.azure_vm_architecture

  identifier {
    publisher = "RedHat-gen2"
    offer     = "rhcos-gen2"
    sku       = "gen2"
  }
}

resource "azurerm_shared_image_version" "clustergen2_image_version" {
  name                = "0.0.1"
  gallery_name        = azurerm_shared_image.clustergen2.gallery_name
  image_name          = azurerm_shared_image.clustergen2.name
  resource_group_name = azurerm_shared_image.clustergen2.resource_group_name
  location            = azurerm_shared_image.clustergen2.location

  blob_uri           = azurerm_storage_blob.rhcos_image.url
  storage_account_id = azurerm_storage_account.cluster.id

  target_region {
    name                   = azurerm_shared_image.clustergen2.location
    regional_replica_count = 1
  }
}

Debug Output/Panic Output

time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image_gallery.sig: Creation complete after 34s [id=/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-2h3vjhpr-2dc90-7tk4z-rg/providers/Microsoft.Compute/galleries/gallery_ci_op_2h3vjhpr_2dc90_7tk4z]"
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.clustergen2: Creating..."
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.cluster: Creating..."
time="2022-09-06T18:08:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [10s elapsed]"
time="2022-09-06T18:09:04Z" level=debug msg="azurerm_role_assignment.main: Still creating... [40s elapsed]"
time="2022-09-06T18:09:09Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [20s elapsed]"
time="2022-09-06T18:09:14Z" level=debug msg="azurerm_role_assignment.main: Still creating... [50s elapsed]"
time="2022-09-06T18:09:16Z" level=debug msg="azurerm_role_assignment.main: Creation complete after 51s [id=/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-2h3vjhpr-2dc90-7tk4z-rg/providers/Microsoft.Authorization/roleAssignments/67c2a5ea-9264-0748-7798-6396c6036fec]"
time="2022-09-06T18:09:19Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [30s elapsed]"
time="2022-09-06T18:09:29Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [40s elapsed]"
time="2022-09-06T18:09:39Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [50s elapsed]"
time="2022-09-06T18:09:49Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m0s elapsed]"
time="2022-09-06T18:09:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m10s elapsed]"
time="2022-09-06T18:10:09Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m20s elapsed]"
time="2022-09-06T18:10:19Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m30s elapsed]"
time="2022-09-06T18:10:29Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m40s elapsed]"
time="2022-09-06T18:10:39Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [1m50s elapsed]"
time="2022-09-06T18:10:49Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m0s elapsed]"
time="2022-09-06T18:10:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m10s elapsed]"
time="2022-09-06T18:11:09Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m20s elapsed]"
time="2022-09-06T18:11:19Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m30s elapsed]"
time="2022-09-06T18:11:29Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m40s elapsed]"
time="2022-09-06T18:11:39Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [2m50s elapsed]"
time="2022-09-06T18:11:49Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [3m0s elapsed]"
time="2022-09-06T18:11:59Z" level=debug msg="azurerm_storage_blob.rhcos_image: Still creating... [3m10s elapsed]"
time="2022-09-06T18:12:04Z" level=debug msg="azurerm_storage_blob.rhcos_image: Creation complete after 3m16s [id=https://clusteromnyi.blob.core.windows.net/vhd/rhcosomnyi.vhd]"
time="2022-09-06T18:12:04Z" level=error
time="2022-09-06T18:12:04Z" level=error msg="Error: creating/updating Shared Image: (Image Name \"ci-op-2h3vjhpr-2dc90-7tk4z\" / Gallery Name \"gallery_ci_op_2h3vjhpr_2dc90_7tk4z\" / Resource Group \"ci-op-2h3vjhpr-2dc90-7tk4z-rg\"): compute.GalleryImagesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code=\"ParentResourceNotFound\" Message=\"Can not perform requested operation on nested resource. Parent resource 'gallery_ci_op_2h3vjhpr_2dc90_7tk4z' not found.\""
time="2022-09-06T18:12:04Z" level=error
time="2022-09-06T18:12:04Z" level=error msg="  with azurerm_shared_image.cluster,"
time="2022-09-06T18:12:04Z" level=error msg="  on main.tf line 104, in resource \"azurerm_shared_image\" \"cluster\":"
time="2022-09-06T18:12:04Z" level=error msg=" 104: resource \"azurerm_shared_image\" \"cluster\" {"
time="2022-09-06T18:12:04Z" level=error
time="2022-09-06T18:12:04Z" level=error

Expected Behaviour

shared image should be created successfully in the image gallery

Actual Behaviour

time="2022-09-06T18:12:04Z" level=error msg="Error: creating/updating Shared Image: (Image Name "ci-op-2h3vjhpr-2dc90-7tk4z" / Gallery Name "gallery_ci_op_2h3vjhpr_2dc90_7tk4z" / Resource Group "ci-op-2h3vjhpr-2dc90-7tk4z-rg"): compute.GalleryImagesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="ParentResourceNotFound" Message="Can not perform requested operation on nested resource. Parent resource 'gallery_ci_op_2h3vjhpr_2dc90_7tk4z' not found.""

Steps to Reproduce

No response

Important Factoids

No response

References

Hit this issue when i opened a PR to use image galleries in openshift. The e2e job for azure that we have hit this error and the discussion is here: https://github.com/openshift/installer/pull/6304#issuecomment-1238759075

Prashanth684 avatar Sep 07 '22 15:09 Prashanth684

hi @Prashanth684 thanks for opening the issue. The dependency is auto generated by the statement gallery_name = azurerm_shared_image_gallery.sig.name so no depends_on is required in azurerm_shared_image.clustergen2. However there seems to be a caching issue that even the gallery is created successfully, it could not be found. I've tested a few times and unfortunately not able to repro it locally, is this issue consistently occurring in your environment? I'll try few more times with it and we may need to and an retry in the provider based on the 404 response when creating the image

myc2h6o avatar Sep 08 '22 06:09 myc2h6o

hi @Prashanth684 thanks for opening the issue. The dependency is auto generated by the statement gallery_name = azurerm_shared_image_gallery.sig.name so no depends_on is required in azurerm_shared_image.clustergen2. However there seems to be a caching issue that even the gallery is created successfully, it could not be found. I've tested a few times and unfortunately not able to repro it locally, is this issue consistently occurring in your environment? I'll try few more times with it and we may need to and an retry in the provider based on the 404 response when creating the image

The issue is not consistent and is only happening in our CI environment. The CI environment has many clusters created at the same instance and might be heavily loaded at times. I am not sure if that could contribute to this. But , i was not able to reproduce it on manual testing as well.

Prashanth684 avatar Sep 08 '22 18:09 Prashanth684

@grayzu - Is this something the team can look into ? @Prashanth684 - Does it happen with a specific version (or) latest version? Have you tried using another version of terraform.

sandeepraichura avatar Sep 08 '22 18:09 sandeepraichura

@grayzu - Is this something the team can look into ? @Prashanth684 - Does it happen with a specific version (or) latest version? Have you tried using another version of terraform.

you mean the azurerm provider? if so i am using 3.19.1

Prashanth684 avatar Sep 08 '22 19:09 Prashanth684

also sometimes the error is seen when creating the image version where it could not find the image definition:

level=error msg=Error: creating Shared Image Version: (Version Name "412.86.20220810" / Image Name "ci-op-7l9zv3g1-2a01b-g4vfv" / Gallery Name "gallery_ci_op_7l9zv3g1_2a01b_g4vfv" / Resource Group "ci-op-7l9zv3g1-2a01b-exrg"): compute.GalleryImageVersionsClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code="ParentResourceNotFound" Message="Can not perform requested operation on nested resource. Parent resource 'gallery_ci_op_7l9zv3g1_2a01b_g4vfv/ci-op-7l9zv3g1-2a01b-g4vfv' not found."
level=error
level=error msg=  with azurerm_shared_image_version.cluster_image_version,
level=error msg=  on main.tf line 138, in resource "azurerm_shared_image_version" "cluster_image_version":
level=error msg= 138: resource "azurerm_shared_image_version" "cluster_image_version" {
level=error

Prashanth684 avatar Sep 13 '22 03:09 Prashanth684

any updates on this? this issue is becoming a blocker for our enablement of arm64 vm provisioning on Azure.

Prashanth684 avatar Sep 16 '22 14:09 Prashanth684

I found even with the 30-second sleep in the updated pr, the image creation still has issue with finding the gallery, though gallery detail could be retrieved when setting the properties in azurem_shared_image_gallery after its creation. I would suggest open a support at Azure to check whether there is any cache issue within the subscription. Meanwhile, since we are seeing this issue, I think we would need to apply a fix to azurerm_shared_image and azurerm_shared_image_version similar to below code, we will need to add retry on 404 http status code in the provider https://github.com/hashicorp/terraform-provider-azurerm/blob/de964e86cb80ca3a31246297eaa9a7b51219c587/internal/services/apimanagement/api_management_subscription_resource.go#L193-L198

myc2h6o avatar Sep 21 '22 09:09 myc2h6o

i just tried adding the retry locally for testing and i'm seeing that with this retry, we need something like a depends_on to make sure that the resources get created? i have a module trying to create a VM and it throws an error with the retry because the resource has not been created yet.

Prashanth684 avatar Sep 21 '22 19:09 Prashanth684

@myc2h6o based on your comment above i tried using the retry mechanism in the provider: https://github.com/openshift/installer/pull/6304/commits/56e2a1c0db552d909e12c689953ef881d39042f3 but i still had to add a timed wait because it looks like the function asynchronously returns immediately and the image version is not created yet. Although the results at first glance seem positive (i don't see 404 error on the azure jobs), maybe there is a way to refine it ?

Prashanth684 avatar Sep 22 '22 03:09 Prashanth684

Hi @Prashanth684 I've created #18496 to retry the Create when 404 happens. It will retry until the resource operation timeout. As for the depends_on, if resource A references a property in resource B, the dependency will be generated automatically and doesn't require an explicit depends_on, like below case:

resource "azurerm_shared_image" "clustergen2" {
  gallery_name        = azurerm_shared_image_gallery.sig.name
  ...
}

And for a temporary workaround for adding time_sleep.wait_30_seconds, this requires depends_on in the time_sleep as well as the resources that need to wait for 30 seconds before creation. For module dependency, if module A depends on module B, B needs to have an output variable, and A needs to reference that variable. More details could be found at https://stackoverflow.com/a/58277124

myc2h6o avatar Sep 22 '22 14:09 myc2h6o

Hi @Prashanth684 I've created #18496 to retry the Create when 404 happens. It will retry until the resource operation timeout. As for the depends_on, if resource A references a property in resource B, the dependency will be generated automatically and doesn't require an explicit depends_on, like below case:

resource "azurerm_shared_image" "clustergen2" {
  gallery_name        = azurerm_shared_image_gallery.sig.name
  ...
}

And for a temporary workaround for adding time_sleep.wait_30_seconds, this requires depends_on in the time_sleep as well as the resources that need to wait for 30 seconds before creation. For module dependency, if module A depends on module B, B needs to have an output variable, and A needs to reference that variable. More details could be found at https://stackoverflow.com/a/58277124

thanks ! i was able to test out the PR that you put up (as part of my PR) and i did not see the 404 errors in the azure jobs although there was an image creation timeout which i am not sure is related.

Prashanth684 avatar Sep 23 '22 03:09 Prashanth684

Hi @Prashanth684 Since I closed the pr, let's put the further discussion at this issue. For enabling the debug log https://www.terraform.io/internals/debugging, the previous log you shared seems not setting the environment TF_LOG to "DEUBG". After enabling the debug log, you would be able to see something like below, with the request URI and Json payload, which would help with the troubleshooting. And the X-Ms-Correlation-Request-Id could be used to identify the request on Azure end as well.

PUT /subscriptions/***/resourcegroups/test_rg?api-version=2020-06-01 HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.18.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 resources/2020-06-01 HashiCorp Terraform/1.2.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
Content-Length: 31
Content-Type: application/json; charset=utf-8
X-Ms-Authorization-Auxiliary:
X-Ms-Correlation-Request-Id: 70ecaa60-adff-1ebf-8702-562b1430a19f
Accept-Encoding: gzip

{"location":"eastus","tags":{}}: timestamp=2022-09-26T12:34:07.479+0800

myc2h6o avatar Sep 26 '22 04:09 myc2h6o

Hi @Prashanth684 Since I closed the pr, let's put the further discussion at this issue. For enabling the debug log https://www.terraform.io/internals/debugging, the previous log you shared seems not setting the environment TF_LOG to "DEUBG". After enabling the debug log, you would be able to see something like below, with the request URI and Json payload, which would help with the troubleshooting. And the X-Ms-Correlation-Request-Id could be used to identify the request on Azure end as well.

PUT /subscriptions/***/resourcegroups/test_rg?api-version=2020-06-01 HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.18.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 resources/2020-06-01 HashiCorp Terraform/1.2.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
Content-Length: 31
Content-Type: application/json; charset=utf-8
X-Ms-Authorization-Auxiliary:
X-Ms-Correlation-Request-Id: 70ecaa60-adff-1ebf-8702-562b1430a19f
Accept-Encoding: gzip

{"location":"eastus","tags":{}}: timestamp=2022-09-26T12:34:07.479+0800

Sorry, that was my bad. We enabled debug logs, but unfortunately have not reproduced this issue yet with debug logs enabled. I will let you know when we do.

Prashanth684 avatar Sep 27 '22 14:09 Prashanth684

reproduced the issue with debug logs here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_installer/6304/pull-ci-openshift-installer-master-e2e-azure-ovn/1574756406718894080/artifacts/e2e-azure-ovn/ipi-install-install/artifacts/.openshift_install.log

Prashanth684 avatar Sep 27 '22 15:09 Prashanth684

@myc2h6o do the logs help?

Prashanth684 avatar Sep 28 '22 17:09 Prashanth684

Hi @Prashanth684 the logs you shared shall help with the troubleshooting. I've shared them internally with the service team. Meanwhile, I would suggest you also open a support ticket at Azure for this subscription with sharing this log as well to get an official support.

myc2h6o avatar Sep 29 '22 08:09 myc2h6o

@myc2h6o Note that are 2 additional issues for similar problems.

  • https://github.com/hashicorp/terraform-provider-azurerm/issues/18350
  • https://github.com/hashicorp/terraform-provider-azurerm/issues/18311

We also know for sure the problem started during the day on 9/1.

Have you heard back from the service team? We have had a ticket open for several weeks with little progress from the support side.

stbenjam avatar Sep 29 '22 16:09 stbenjam

It seems like the Azure ARM API errors have been addressed on the service side, at least in the regions in which we operate. This can likely be closed though it'd be nice to have assurances that those defects won't re-appear in the future.

sdodson avatar Oct 25 '22 15:10 sdodson

This issue is still occurring for us when deploying Azure SQL to the East region.

jonathaneckman avatar Apr 12 '23 17:04 jonathaneckman

Hi @jonathaneckman, the previous issue seems to be resolved on service side so yours may be different than that. I'd suggest you open a new issue to describe your scenario and share the Terraform config you are using. Meanwhile, there could also be some issue at Azure side, so I'd recommend you open a support ticket to Azure as well.

myc2h6o avatar Apr 13 '23 01:04 myc2h6o

Thanks for taking the time to open this issue. It appears the original issue was resolved on the service side, as such I am going to mark this issue as closed. If there is still an issue on the latest version of the provider please do let us know by opening a new issue, thanks!

rcskosir avatar Jun 12 '24 19:06 rcskosir

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Jul 13 '24 02:07 github-actions[bot]