terraform-oci-oke
terraform-oci-oke copied to clipboard
[v4][regression] Operator VM gets recreated at every non-OKE module update
On the latest 4.x branch, after an OKE cluster with operator host is created via terraform apply if I would call terraform plan it will output that an oci_core_instance.operator resource must be replaced:
# module.oke-v4.module.operator[0].oci_core_instance.operator must be replaced
-/+ resource "oci_core_instance" "operator" {
~ availability_domain = "VMwN:UK-LONDON-1-AD-1" -> (known after apply) # forces replacement
~ boot_volume_id = "ocid1.bootvolume.oc1.uk-london-1.abwgiljs2vxcfitwxjkzrqyujiinwth3rq2w5mlkvbyad3ttjll25k27ahzq" -> (known after apply)
+ capacity_reservation_id = (known after apply)
+ dedicated_vm_host_id = (known after apply)
~ defined_tags = {} -> (known after apply)
- extended_metadata = {} -> null
~ fault_domain = "FAULT-DOMAIN-2" -> (known after apply)
~ hostname_label = "test-operator" -> (known after apply)
~ id = "ocid1.instance.oc1.uk-london-1.anwgiljss7djfsiccnnusqvn3rlazt2pl5vs5ttsqdd36bgyl44cdl5ogo7q" -> (known after apply)
~ image = "ocid1.image.oc1.uk-london-1.aaaaaaaa646hmq7yvlxk6wqhdzrljfxdy7iyy6wk7xtmdf3x73ko45nwqfsa" -> (known after apply)
+ ipxe_script = (known after apply)
+ is_pv_encryption_in_transit_enabled = (known after apply)
~ launch_mode = "PARAVIRTUALIZED" -> (known after apply)
~ metadata = {
- "ssh_authorized_keys" = <<-EOT
<SANITIZED>
EOT
- "user_data" = "H4sIAAAAAAAA/3RUXW/iPBO9j9T/MKKXXXCA8LnqSuE7QBpoAwt7UxnHJG4T29gOEPT++FfQ3W71PHpylfGcOT4zc+S+4IZyUw4LSbuQ5alhEiuDMnam0XfYiZxHWBWPJd/zh71g9TRwn7cl6xqV11RpJngXqhX7zrqzyuWvoDvrD/eAaSk0MzcsNgaTJKPcfIc9SynHGX0sCUkVNkJVCpylpb+locJc76kqDzkREeNxF1o7Zr4AbroNPRtEUpFHZSL4nsV3ls8y+i+F99AXslAsTgzU7GrrG9TsWvUbBAqTlEJfKCkUvgoFzCMkFOD9nqUMG6orAG6awq1Yg6KaqiONKtY9zBmhXNMIch5RBSahsOLsSJXGKSyoypjW7Ej/4OB4lQNYg07EiQM2kBgjdRchoXVF3LRUiMhQ+lGgUS5TZFn3Xzu0JCbvOKavuYwVjq5DUDm1DMvoRXDaBTfXRuGUYfRSRJwW1kkxQ1+vM9dd6x40NbkETRSTxgIog8Qm6UIJKSEM+rOQz5+KTkoWAID83ZDgugslu2XbH+f0c0Ol+MLkww5r2nQ+cuRjW1343y0EmDjac39/yC6WoYNCvzcYtZCzemvW7Om5qNeX/vrgUntdvDhk5nnhr+rU17PqfNsOG+OHxSo52NP28bLP0RPv2K2osRHq8BL6Yd6Wppi9jRYjskFOu9eaoP3YVc66vZ0X2XsaJDOW9bd558Vd1ga92WRWi8f7ZXsQHQqmN71dHrQWgxEaknRz9DWl/eBt/PBczZyZu247dS9cr1PHaQ1iO6j2+rL+wk/skI3V04MZN+TTIX0oiii8kA1K+OoSvjvbWugOGufTOX5Hi2bViHjSDPw+D/xE7M7PsuFOVRB5VX/kOYjKhvPLLUaXE6rXJjtvNFTJdkCaMmTMrlLldljQj3fCr9XTnbcmD03pH87rDg+ea0nrMMoGuzQbu/MFT4+t6WiQzX2aCKW9E4/CcNPx3JyqhgjHnXkwC875cTh9O5g6aTZHKECmFf0aXejTW+Ny+tlMshparlx3gZC7dF2EOixFP9+d09B13cdHS+WcZFHXgjJQkggo9W/uzBXjMXx6p1IpXRE7rBP4b3/9JaFnKZSBoO+99ufeq7sKJ4+Ma4M5oa9SMU6YxGkJfvwAQInIKBKSoMqVX5ErjRE5Sb6kPm/ZM8508s+3qly+s/4fAAD//7qFZ0wKBQAA"
} -> (known after apply) # forces replacement
~ private_ip = "10.0.0.10" -> (known after apply)
+ public_ip = (known after apply)
~ region = "uk-london-1" -> (known after apply)
~ subnet_id = "ocid1.subnet.oc1.uk-london-1.aaaaaaaadabk34qp3fkfwi2ur7dk5e37q34hpn4i3nbgauvqszcc4zi67a4q" -> (known after apply)
~ system_tags = {} -> (known after apply)
~ time_created = "2021-10-14 07:47:03.581 +0000 UTC" -> (known after apply)
+ time_maintenance_reboot_due = (known after apply)
# (5 unchanged attributes hidden)
~ agent_config {
# (3 unchanged attributes hidden)
- plugins_config {
- desired_state = "ENABLED" -> null
- name = "Management Agent" -> null
}
# (1 unchanged block hidden)
}
~ availability_config {
~ is_live_migration_preferred = false -> (known after apply)
~ recovery_action = "RESTORE_INSTANCE" -> (known after apply)
}
~ create_vnic_details {
- assign_private_dns_record = false -> null
~ defined_tags = {} -> (known after apply)
~ freeform_tags = {
- "environment" = "dev"
- "role" = "operator"
} -> (known after apply)
~ private_ip = "10.0.0.10" -> (known after apply)
~ skip_source_dest_check = false -> (known after apply)
+ vlan_id = (known after apply)
# (5 unchanged attributes hidden)
}
~ instance_options {
~ are_legacy_imds_endpoints_disabled = false -> (known after apply)
}
~ launch_options {
~ firmware = "UEFI_64" -> (known after apply)
~ is_consistent_volume_naming_enabled = true -> (known after apply)
~ is_pv_encryption_in_transit_enabled = false -> (known after apply)
~ remote_data_volume_type = "PARAVIRTUALIZED" -> (known after apply)
# (2 unchanged attributes hidden)
}
+ platform_config {
+ is_measured_boot_enabled = (known after apply)
+ is_secure_boot_enabled = (known after apply)
+ is_trusted_platform_module_enabled = (known after apply)
+ numa_nodes_per_socket = (known after apply)
+ type = (known after apply)
}
+ preemptible_instance_config {
+ preemption_action {
+ preserve_boot_volume = (known after apply)
+ type = (known after apply)
}
}
~ shape_config {
+ baseline_ocpu_utilization = (known after apply)
+ gpu_description = (known after apply)
~ gpus = 0 -> (known after apply)
+ local_disk_description = (known after apply)
~ local_disks = 0 -> (known after apply)
~ local_disks_total_size_in_gbs = 0 -> (known after apply)
~ max_vnic_attachments = 2 -> (known after apply)
~ networking_bandwidth_in_gbps = 1 -> (known after apply)
~ processor_description = "2.25 GHz AMD EPYC™ 7742 (Rome)" -> (known after apply)
# (2 unchanged attributes hidden)
}
~ source_details {
~ boot_volume_size_in_gbs = "47" -> (known after apply)
+ kms_key_id = (known after apply)
# (2 unchanged attributes hidden)
}
# (1 unchanged block hidden)
}
Not sure what has changed, but this is a regression as previously operator host remained the same through the lifecycle of an OKE cluster.
I'm unable to replicate this behaviour.
Closed due to lack of activity. Please reopen if this is still impacting you
@hyder
We faced a similar issue, after a lot of debugging. It turns out the below change (which wasn't caused by any code change. It just shows up randomly), causes the plan to have 4 to add, 30 to change, 3 to destroy.
But after removing all the depends_on = [ module.vcn] in main.tf. The plan had only one update. The rest of the changes were gone.
https://itnext.io/beware-of-depends-on-for-modules-it-might-bite-you-da4741caac70
I am not sure what's causing the security list to change, but I want to leave this comment to help anyone who stumbles upon this issue like I did.
# module.terraform-oci-oke.module.vcn[0].oci_core_default_security_list.lockdown[0] will be updated in-place
~ resource "oci_core_default_security_list" "lockdown" {
id = "REDACTED"
# (7 unchanged attributes hidden)
- egress_security_rules {
- destination = "172.16.64.0/18" -> null
- destination_type = "CIDR_BLOCK" -> null
- protocol = "6" -> null
- stateless = false -> null
- tcp_options {
- max = 10256 -> null
- min = 10256 -> null
}
}
- egress_security_rules {
- destination = "172.16.64.0/18" -> null
- destination_type = "CIDR_BLOCK" -> null
- protocol = "6" -> null
- stateless = false -> null
- tcp_options {
- max = 31440 -> null
- min = 31440 -> null
}
}
- ingress_security_rules {
- protocol = "6" -> null
- source = "0.0.0.0/0" -> null
- source_type = "CIDR_BLOCK" -> null
- stateless = false -> null
- tcp_options {
- max = 80 -> null
- min = 80 -> null
}
}
- ingress_security_rules {
- protocol = "6" -> null
- source = "172.16.2.32/27" -> null
- source_type = "CIDR_BLOCK" -> null
- stateless = false -> null
- tcp_options {
- max = 10256 -> null
- min = 10256 -> null
}
}
- ingress_security_rules {
- protocol = "6" -> null
- source = "172.16.2.32/27" -> null
- source_type = "CIDR_BLOCK" -> null
- stateless = false -> null
- tcp_options {
- max = 31440 -> null
- min = 31440 -> null
}
}
}
I'm reopening the issue
hi @bader-tayeb,
Thanks for notifying us on this issue. Can I please check with you if you created a service of type LoadBalancer by any chance? You may have created it manually or you have deployed an ingress controller or a packaged helm chart that caused a LoadBalancer to be created?
@hyder
After looking into it, yes we've created a service of type LoadBalancer and that caused the "oci_core_default_security_list" "lockdown" to have a plan change in terraform.
But it was difficult to spot this change because the depends on caused it to have 30+ changes instead of just the one.
@bader-tayeb thanks for confirming. While investigate a solution, when you create the load balancer, can you please set the load balancer annotations so that the frontend management mode is none and you also specify the nsg? This will ensure your load balancer is healthy while not modifying the default security list.
We too had a similar issue, with the seclists wanting to change and the bastion wanting to be rebuilt at each apply.
It solved itself when we moved to a VCN provisioned out of the module, so this might be another workaround for people reading this.
@hyder we've used the annotations as suggested (link), but the issue still persists. Every time we create the load balancer, the default security list still gets changed.
There might be more than 1 issue in play. We'll try to identify and fix the problem(s).
Please bear with us.
@bader-tayeb @12345ieee:
I've just created a PR (#565) for this. Can you please try this in a new cluster and let us know if this fixes the issue?
hi @bader-tayeb @12345ieee,
Can you please test this PR before I merge? Otherwise, I'll assume it's working and go ahead.
@hyder
This is an acceptable workaround, it fixes the issue. But it invalidates the need for the "oci_core_default_security_list" "lockdown" resource. Since I assume this resource's goal is to prevent such changes.
Thanks for testing. It won't invalidate the need to lockdown the default security list. It just won't trigger a recreation of other resources such as the bastion host if default security list was modified out of band e.g. if you created a service of type LoadBalancer and you didn't override the management mode to "None". We'll go ahead and merge then.