terraform-provider-azurerm icon indicating copy to clipboard operation
terraform-provider-azurerm copied to clipboard

v4 removed private vnet integration for API server in azurerm_kubernetes_cluster

Open LiamLeane opened this issue 1 year ago • 18 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Community Note

  • Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

Tofu 1.8.2

AzureRM Provider Version

4.4.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "k8s" {
  api_server_access_profile {
    subnet_id                = var.api_server_subnet_id # Does not exist
    vnet_integration_enabled = true # Does not exist
  }
}

Debug Output/Panic Output

│ Error: Unsupported argument
│ 
│   on ..\..\k8.tf line 118, in resource "azurerm_kubernetes_cluster" "k8s":
│  118:     subnet_id                = var.api_server_subnet_id
│ 
│ An argument named "subnet_id" is not expected here.
╵
╷
│ Error: Unsupported argument
│ 
│   on ..\..\k8.tf line 119, in resource "azurerm_kubernetes_cluster" "k8s":
│  119:     vnet_integration_enabled = true
│ 
│ An argument named "vnet_integration_enabled" is not expected here.

Expected Behaviour

Can vnet integrate API server

Actual Behaviour

Can't integrate API server.

I understand why this occurred with the migration to the stable API. However, this specific case should have been exempted as API vnet integration is required for key_management_service.key_vault_network_access = "Private" which is required for every security & compliance standard that exists.

Unless egress is using a user defined resource the IP that will originate the KMS calls will not be known until after the cluster is created which requires an unsafe azurerm_key_vault configuration with a default allow rule. Having public internet access enabled to AKV at all is inherently unsafe but this compounds that problem.

As it currently stands this resource is not usable in professional cloud settings (currently entirely unusable in gov cloud, it wont meet 800-53) and azapi or ARM/bicep has to be used in place of this.

Steps to Reproduce

No response

Important Factoids

No response

References

No response

LiamLeane avatar Oct 12 '24 23:10 LiamLeane

Hey @LiamLeane

Unfortunately we can't pick and choose which bits of the AKS resource use a preview version and which a stable without breaking the entire resource, so exempting this particular preview feature was not a viable option for us. We're dismayed by the loss of functionality that we can provide in the resource caused by the migration to the stable API, in particular for features that relate to security and compliance. This was a joint decision made at the request of the AKS Service Team, I suggest getting in touch with them over on Azure/AKS to see if there is a timeline for when this will go GA so we can add it back into the provider.

stephybun avatar Oct 16 '24 09:10 stephybun

This is a showstopper for us as well, exactly for the reasons outlined by @LiamLeane. It's very disappointing that we now have to move away from Terraform and I imagine a lot of others as well, that require this feature for compliance and security reasons.

iverberk avatar Oct 16 '24 11:10 iverberk

I want to point out that Microsoft is also going to be disabling the APIs that the v3 AzureRM provider uses in the near future so we won't be able to keep creating clusters using that provider once that happens.

https://learn.microsoft.com/en-us/azure/aks/concepts-preview-api-life-cycle#upcoming-deprecations

@stephybun can you confirm what preview API version was used by the last v3.x provider release, so that we can at least know which date from the above linked table will cause this to happen and plan accordingly? Hopefully we can also reference this issue, the timeline and the API version I am asking about when talking to MS support about this because it's going to really hit people hard when they are forced onto v4.

tspearconquest avatar Oct 17 '24 20:10 tspearconquest

I did some more digging over on Azure/AKS. I found something positive.

https://github.com/Azure/AKS/issues/2729

GA ETA: Q1 2025*

*ETAs are only estimates and can be subject to change

Both AKS Public and Private Cluster use a Tunnel, also a Private Cluster is currently based on Private Link. API Server VNET Integration will be enabled for both Public and Private clusters, and will allow for upgrade capabilities from Public to Private and Private to Public if required, without a redeploy.

tspearconquest avatar Oct 17 '24 21:10 tspearconquest

Thanks for finding and linking that info @tspearconquest! That is certainly promising news.

The preview version used by the last v3.x release was 2023-09-02-preview which doesn't appear to have a deprecation date yet. Judging by the table it's safe to say it will be post March 10.

I'll pass this on to our contacts at MSFT to see if we can avoid a situation where the preview version is deprecated, but the feature hasn't been released and added to the provider yet.

stephybun avatar Oct 18 '24 08:10 stephybun

I found this in the documentation https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/4.0-upgrade-guide#aks-migration-to-stable-api https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/4.0-upgrade-guide#vnet-integration

In order to provide a more reliable experience when using the Azure Provider, Microsoft has requested that the AKS resource only use a stable API version. Starting in version 4.0 of the AzureRM provider, Microsoft and HashiCorp have agreed to switch to the AKS stable APIs. This means that some preview features that are exposed in 3.x will no longer be available in 4.x.

If you wish to use these or any other AKS preview features and accept the risk that comes with using preview features, the AzAPI Provider can be used to access these features.

I agree this is an essential feature for compliance and security reasons.

zeqk avatar Nov 01 '24 13:11 zeqk

@stephybun were you able to receive any indication from your MSFT contact about the preview version being deprecated before the feature becomes GA?

In general, I still think that this a step in the wrong direction. Preview features are clearly indicated as so, and users already understand the risks of using such features. Using only the stable API severely limits the usefulness of azurerm for managing AKS clusters. This means that as soon as there is a need for a preview feature (for experimentation or other), users must either switch to the very much less convenient AzAPI or to another infrastructure management tool completely.

The preview API is already usable through Biceps, this is not really protecting or helping anyone, rather simply reducing the azurerm usefulness for AKS compared to other tools.

jemag avatar Nov 26 '24 21:11 jemag

The preview API is already usable through Biceps, this is not really protecting or helping anyone, rather simply reducing the azurerm usefulness for AKS compared to other tools.

My hot take is that doing so protects Microsoft Support teams from users who didn't read the docs beyond how to enable some preview feature they needed, and then have some problem with the feature that was clearly documented; Preview features come with limited or no support. Microsoft does this to Hashicorp because the Terraform provider is not officially supported by Microsoft. They allow Bicep to continue because it is a first party product of Microsoft's.

Speeddymon avatar Nov 26 '24 21:11 Speeddymon

The preview API is already usable through Biceps, this is not really protecting or helping anyone, rather simply reducing the azurerm usefulness for AKS compared to other tools.

The issue is more around how long MS are allowing things to remain preview for and preview is an enormous mix of states. You have APIs that are really preview and highly likely to change (or never make it to GA) and you have APIs like this which have existed for over 4 years at this point, are almost certainly not going to change and absolutely will make GA.

Most companies adopting an IaC tool will not choose multiple because of the hell of state & dependency management if you do, the additional o16 overhead etc. In this case the "preview" feature wasn't an esoteric use case but something that is effectively essential, you don't deploy AKS clusters with a public API outside of playground.

In our case we are using multiple provider versions because its slightly less bad than provisioning the cluster via ARM.

Closing this now as clearly, it's been discussed and won't be corrected. It would probably be sensible for greater consideration/discussion around how the provider is being used vs how Hashi wants it to be used when there is a massive breaking change down the line. Maybe it's just me but finding out via release notes was pretty surprising. Given MS include HCL in all their documentation I would also hope some capacity for you guys to push back on silly ideas they come up with like this :)

LiamLeane avatar Nov 27 '24 00:11 LiamLeane

@LiamLeane the decision to stay on preview vs. move to stable ended up being discussed for about a year and in the end we didn't get sufficient evidence from our customers or the community that moving to a stable version of the API would be a blocker.

I unfortunately don't have any insight into how the feedback from our customers on this issue was gathered but we fell down in collecting input from as well as informing the community on this and while we did make it a point to mark requests for preview features as blocked, as well as to respond to preview feature requests with the explanation of "there is currently an ongoing discussion to move to stable" this was clearly not enough to raise any concern among our users.

Having the community's input on decisions like these is important and valuable for us when it comes to discussions with MSFT on the direction of the provider. We want to do better there and thinking back we should have pinned an issue to the repo on the matter whilst it was still in discussion and linked it in the community slack.

I would be interested in any feedback you/the community/anyone else on this thread have here on how you would prefer to be informed/involved in discussions such as these.

In the meantime, I'm still waiting on a response from the service team for when they plan to go GA with this feature/deprecate the last preview version in use in the provider, but I will be bringing all the feedback in this thread to both HC and MSFT.

stephybun avatar Dec 03 '24 13:12 stephybun

I would be interested in any feedback you/the community/anyone else on this thread have here on how you would prefer to be informed/involved in discussions such as these.

It would be great if the following sequence of events had happened, and steps to make this possible going forward would be (IMHO) the best way to get visibility.

When a change discussion requested by an Azure Service team progresses to your team's version of an "accepted" state, the documentation pages in the terraform registry should have a note appended to the top of the page linking to the issue where the discussion is happening. This can even be automated via metadata in the GitHub issues (<!-- ... -->) that you would ask the service team to fill in when submitting issues to your team for discussion; some process could watch issues for this sort of metadata and automatically insert a link to the top of the affected resources.

I can't speak for other teams but the only time I come to this GitHub is when I run into a problem with the provider or when there's a new provider release; and in neither case would I be directed to issues like this one.

I primarily look at resource documentation (in the terraform registry) all day so the only way I'll know about these changes is if it's mentioned in there on the actual resource page; as well as separately in the upgrade guide like it was done once it becomes implemented and ready for RC testing.

Speeddymon avatar Dec 03 '24 14:12 Speeddymon

I agree that unless there is an explicit note in the documentation, it is quite hard for end users to notice until it is too late. In our case (as well as probably many others), we only noticed once the VNET integration feature was gone, which was quite alarming.

I still believe offering the preview API is the way to go since it won't prevent users from using these features, merely make them switch to bicep or AzAPI, both very unfortunate transitions.

There is also the fact that the API might become deprecated before the feature goes GA, which leaves users with a non-working terraform setup @stephybun . There is no strong guarantee as to when VNET integration will be released as it has been delayed countless times so far. This would be a very unfortunate breaking change.

jemag avatar Dec 03 '24 15:12 jemag

These are great suggestions and really helpful @Speeddymon!

Many of these requests arrive privately and are not publicly visible or available for us to link to and there is no formal process or expectation for Service Team's to open issues on the repo for requests (some do get opened but usually only for feature requests like adding new properties/resources, upgrading the version etc.). Taking your feedback on board I will see if we can formalise a better process here and at the very least we can resort to opening an issue on behalf of the request in order to link to the resource documentation for better visibility and communication.

@jemag this is indeed an unfortunate situation, as I mentioned I'm still waiting on a response from the service team about this, I don't expect a response until the new year given we're nearing the end of year holiday period for many people.

stephybun avatar Dec 04 '24 12:12 stephybun

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Jan 04 '25 02:01 github-actions[bot]

The AKS team has announced in https://github.com/Azure/AKS/issues/2729#issuecomment-2819217349 that this feature will be going to GA on May 19.

I hope that this can be implemented to the provider sometime soon after the release.

tspearconquest avatar Apr 28 '25 19:04 tspearconquest

This feature is in the 2025-03-01 Stable API, so it could be implemented any time, though I recommend a note in the docs about limited availability (link to here: https://learn.microsoft.com/en-us/azure/aks/api-server-vnet-integration)

rowanmoul avatar May 22 '25 20:05 rowanmoul

Is it there any estimated date for this integration to be added again?

demian711 avatar Jun 05 '25 08:06 demian711

Any updates regarding this?

Robsta86 avatar Jun 19 '25 10:06 Robsta86

This feature depends on issue-1187 being resolved first. Please consider upvoting that issue to help prioritize it for provider v4 reintegration.

thpham avatar Jun 22 '25 09:06 thpham

@thpham Microsoft.ContainerService apiversion 2025-05-01 has been added in https://github.com/hashicorp/pandora/pull/4883 a couple of days ago. This added support for vnetIntegration and thus deblocks this ticket.

nclaeys avatar Aug 19 '25 14:08 nclaeys

Hello.

Do we have an estimated date for when vnet_integration_enabled will be re-integrated next to the Microsoft update?

Thanks.

jcamu avatar Aug 22 '25 13:08 jcamu

There is a PR open to fix the issue: https://github.com/hashicorp/terraform-provider-azurerm/pull/30496

stijndehaes avatar Sep 12 '25 13:09 stijndehaes

The following PR got merged: https://github.com/hashicorp/terraform-provider-azurerm/pull/30559

THe latest version: v4.46.0, includes the changes

stijndehaes avatar Oct 01 '25 07:10 stijndehaes

I can confirm that I was able to upgrade our AzureRM provider to v4 and successfully run terraform apply against our virtual_network_integration_enabled AKS clusters in EastUS. 🎉🥳

tspearconquest avatar Oct 21 '25 02:10 tspearconquest

Yes I've been using the latest 4.x since stijndehaes posted three weeks ago. No problems at all. This issue can be closed.

rowanmoul avatar Oct 21 '25 14:10 rowanmoul

Thanks for taking the time to submit this issue. It looks like this has been resolved as of #30559. As such, I am going to mark this issue as closed. If that is not the case, please provide additional information including the version in which you are still experiencing this issue, thanks!

rcskosir avatar Oct 21 '25 20:10 rcskosir

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Nov 21 '25 02:11 github-actions[bot]