terraform-provider-azurerm icon indicating copy to clipboard operation
terraform-provider-azurerm copied to clipboard

New resource: `azurerm_redhatopenshift_cluster`

Open fedeoliv opened this issue 3 years ago • 20 comments

azurerm_redhatopenshift_cluster

This PR introduces an Azure Red Hat OpenShift (ARO) provider targeting the issue #3614.

Known limitations

~~Currently the worker profile is immutable on the ARO API, more details here. In practice, it means that:~~

~~* Any change on the profile (e.g. nodes count, disk size) will require a cluster recreation.~~

  • Using oc to scale the MachineSet may cause problems as the cluster state may not be represented in Terraform state.

Update: Immutable worker profile was associated with the API 2020-04-30 version. Not applicable anymore for the 2022-04-01 version.

fedeoliv avatar Feb 04 '22 12:02 fedeoliv

@fedeoliv @tombuildsstuff Just wonder what is the status of this PR. seems this is way off the main branch. Just wondering what was the reason this was not merged initially? I am interested in this feature as well as contributing.

datianshi avatar Jun 08 '22 13:06 datianshi

@datianshi - i believe it was because the worker profile is immutable so after terraform creates the resource there would be no good/easy way to handle any updates made to the worker profile outside of terraform.

katbyte avatar Jun 08 '22 20:06 katbyte

@katbyte Thanks for the reply. It makes sense. But I still think it has value just for provisioning. It is the nature of the resource. Up to the end-user to know the risks of changing/updating. End-user can use lifecycle management to prevent destroy as well.

datianshi avatar Jun 08 '22 20:06 datianshi

@katbyte does that mean this is not going to be added until RH changes the way RHO is deployed?

jgardner04 avatar Jun 13 '22 22:06 jgardner04

I agree with you @datianshi, I believe it's valuable providing the option of deploying ARO through Terraform even with this current limitation.

fedeoliv avatar Jun 14 '22 17:06 fedeoliv

Well, apparently the new RHO version on the Azure SDK supports mutable worker profile attributes, see the 2022-04-01 version here. The previous 2020-04-30 version had immutable properties (details for comparison here).

I'll update the resource to use this new version to validate if that works and make the required adjustments.

fedeoliv avatar Jun 14 '22 20:06 fedeoliv

FYI @fedeoliv I created an independent provider here https://registry.terraform.io/providers/rh-mobb/azureopenshift/0.0.2 This is with 2022-04-01 schema. Borrowed a lot of your code + the azurerm authentication. This is in my own repo. I think it should be part of the azurerm resource. Would sunset ours once this is merged and published.

datianshi avatar Jun 16 '22 16:06 datianshi

Awesome @datianshi, thanks for sharing this one! I'm working on updating the schema, CRUD methods and integration tests to match this new version. I'll keep you posted 😊.

fedeoliv avatar Jun 17 '22 12:06 fedeoliv

Hey @tombuildsstuff @katbyte @datianshi the PR is ready to be reviewed again 😊. I've updated the code and documentation to match the 2022-04-01 schema and introduced new acceptance tests for nodes encryption at host and FIPS cryptographic modules.

fedeoliv avatar Jul 01 '22 06:07 fedeoliv

what stops this branch from being approved and merged?

datianshi avatar Jul 20 '22 22:07 datianshi

what stops this branch from being approved and merged?

@tombuildsstuff wondering if you have any update on this review?

fedeoliv avatar Jul 24 '22 20:07 fedeoliv

@tombuildsstuff could you merge it?

piomin avatar Aug 08 '22 13:08 piomin

not sure what client id it is referring to?

@katbyte looking at the error message apparently we can't associate a service principal with multiple clusters (1-1 relationship). The options I have in mind is running the acceptance tests sequentially, or somehow generate new service principals for each test. Any suggestion?

fedeoliv avatar Aug 19 '22 12:08 fedeoliv

@fedeoliv - we can run them sequentially, also if that is a service limitation we should document it in the docs

katbyte avatar Aug 23 '22 23:08 katbyte

@fedeoliv - we can run them sequentially, also if that is a service limitation we should document it in the docs

Great, I updated the ARO documentation with a note about this limitation.

fedeoliv avatar Aug 25 '22 17:08 fedeoliv

@fedeoliv - we still need to change the tests to run sequentially

katbyte avatar Aug 29 '22 19:08 katbyte

@fedeoliv apologies for the contradiction - we can we update this to use a new Service Principal created as a part of each test - we can use the AzureAD provider to do that - see this test as an example - which'd mean we can run these in parallel and avoid this limitation.

tombuildsstuff avatar Aug 30 '22 14:08 tombuildsstuff

@fedeoliv apologies for the contradiction - we can we update this to use a new Service Principal created as a part of each test - we can use the AzureAD provider to do that - see this test as an example - which'd mean we can run these in parallel and avoid this limitation.

That's great, thanks for pointing an example @tombuildsstuff! I'll update the tests with your recommendation!

fedeoliv avatar Aug 31 '22 13:08 fedeoliv

Hi @fedeoliv Thanks for your work on this! I've also have been behind the scenes prodding the right folks to review this PR since we also use OpenShift for Consul K8s testing. Looking forward to see this get merged!

david-yu avatar Sep 09 '22 19:09 david-yu

Any ETA on this? @fedeoliv Amazing work this has been a long time coming.

mzarglis avatar Sep 21 '22 19:09 mzarglis

Hey @fedeoliv - seems like the tests are failing due to only one cluster being allowed. is this a service limit that can be increased or?

Test Failed

------- Stdout: -------
=== RUN   TestAccOpenShiftCluster_private
=== PAUSE TestAccOpenShiftCluster_private
=== CONT  TestAccOpenShiftCluster_private
=== CONT  TestAccOpenShiftCluster_private
    testcase.go:110: Step 1/1 error: Error running apply: exit status 1
        
        Error: creating Red Hat OpenShift Cluster "acctestaro220929200149494479" (Resource Group "acctestRG-aro-220929200149494479"): redhatopenshift.OpenShiftClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="DuplicateResourceGroup" Message="The provided resource group '/subscriptions/*******/resourcegroups/aro-cyufr257' already contains a cluster."
        
          with azurerm_redhatopenshift_cluster.test,
          on terraform_plugin_test.tf line 52, in resource "azurerm_redhatopenshift_cluster" "test":
          52: resource "azurerm_redhatopenshift_cluster" "test" {
        
--- FAIL: TestAccOpenShiftCluster_private (185.03s)

also we'll need to fix up the failing GHA CI

katbyte avatar Sep 29 '22 21:09 katbyte

s like the tests are failing due to only one cluster being allowed. is this a service limit that can be increased or

Hey @katbyte, humm seems this is a limitation. How did you get this error? Did it happen due to parallel tests execution? If yes, that's weird...I thought the random string value associated with a resource group name (e.g. accTestRG-aro-%d) wouldn't be a problem in terms of potential name conflicts.

fedeoliv avatar Oct 04 '22 19:10 fedeoliv

@fedeoliv - i'm not sure, it seems like all the tests are using acctestRG-aro-%d" so no idea where aro-cyufr257 is coming from? unless the service/API is doing something funky behind the scenes and not actually using the RG we specify?

katbyte avatar Oct 05 '22 01:10 katbyte

@fedeoliv Any idea if we're able to resolve the test failures so we can get this PR merged?

david-yu avatar Oct 11 '22 16:10 david-yu

@fedeoliv From chatting internally with some HashiCorp folks, it looks like a few clean up tasks are needed

go mod tidy go mod vendor should fix it up - the real issue here is the failing tests due to that random resource group conflict

david-yu avatar Oct 20 '22 21:10 david-yu

I am still seeing three distinct test failures for this PR: image

It seems like for some of the tests the SP needs additional permissions? could we update the tests and document that?

katbyte avatar Oct 21 '22 18:10 katbyte

Hi @fedeoliv do you know more help here to get this one through?

david-yu avatar Nov 15 '22 04:11 david-yu

@fedeoliv Pinging again, looks like some tests are still failing, did you need help understanding why these failures are occurring? Does OpenShift require a dedicated RG for testing on Azure? We currently use a shared RG here for tests based on chatting with the team.

david-yu avatar Dec 16 '22 21:12 david-yu

@david-yu - is there a resource group behind the scenes used by the service? because we seem to be passing in different one for each test and not defining "aro-xxxxx" anywhere in code?

katbyte avatar Jan 26 '23 18:01 katbyte

You need to define a resource group ahead of time before you provision an Azure RedHat Openshift cluster as per these docs: https://learn.microsoft.com/en-us/azure/openshift/tutorial-create-cluster. I typically create one myself manually using the CLI.

david-yu avatar Jan 26 '23 19:01 david-yu