assisted-service icon indicating copy to clipboard operation
assisted-service copied to clipboard

MGMT-17700: Assign none platform node-ips based on connected addresses and etcd restrictions

Open ori-amizur opened this issue 9 months ago • 10 comments

When none platform is in use, if there is ambiguity in node-ip assignment, then incorrect assignment might lead to installation failure. This happens when etcd detects that the socket address from an etcd node does not match the expected address in the peer certificate. In this case etcd rejects such connection.

Example: assuming two networks - net1 and net2. master node 1 has 1 address that belongs to net1. master node 2 has 2 addresses. one that belongs to net 1, and another that belongs to net 2 master node 3 has 1 address that belongs to net 1. If the selected node-ip of master node 2 belongs to net 2, then when it will create a connection with any other master node, the socket address will be the address that belongs to net 1. Since etcd expects it to be the same as the node-ip, it will reject the connection.

This can be solved by node-ips selection that will not cause such a conflict. Node ips assignment should be done through ignition. To correctly set bootstrap ip, the machine-network for the cluster must be set to match the selected node-ip for that host.

MGMT-17701: Add capability to calculate none platform node-ips based on L3 connected addresses and connectivity

Calculate the node-ips for none platform cluster. Node ip calculation is actually calculation of the node-ip, hint, and cidr. Node-ip is the ip address that exists on the host that was selected as the node-ip. Hint is actually an IP address that does not exist on the host, but must belong to the subnet of the node-ip. Cidr is the subnet in cidr notation that the node-ip and hint belong to. Node-ip calculation is either done for all cluster hosts or none of them.

MGMT-17702: Modify ignition to use calculated node-ips for none platform

In order to set node-ip, ignition is modified. A file called /etc/default/nodeip-configuration contains the hint as was set by node-ip generation. In addition the bootstrap-ip is set in ignition as was set in node-ip generation. This is set as environment variable called "OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP".

MGMT-17703: Modify machine-network based on calculated node-ips

The etcd in boostrap uses he machine-network to set the IP address. Therefore during install-config generation the machine-network may be replaced by the cidr from the node-ip generation.

List all the issues related to this PR

  • [x] New Feature
  • [ ] Enhancement
  • [ ] Bug fix
  • [ ] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [ ] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [ ] None

How was this code tested?

  • [x] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [ ] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [ ] No tests needed

Checklist

  • [ ] Title and description added to both, commit and PR.
  • [ ] Relevant issues have been associated (see CONTRIBUTING guide)
  • [ ] This change does not require a documentation update (docstring, docs, README, etc)
  • [ ] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @tsorya /cc @paul-maidment

ori-amizur avatar May 02 '24 13:05 ori-amizur

@ori-amizur: This pull request references MGMT-17700 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

When none platform is in use, if there is ambiguity in node-ip assignment, then incorrect assignment might lead to installation failure. This happens when etcd detects that the socket address from an etcd node does not match the expected address in the peer certificate. In this case etcd rejects such connection.

Example: assuming to networks - net1 and net2. master node 1 has 1 address that belongs to net1. master node 2 has 2 addresses. one that belongs to net 1, and another that belongs to net 2 master node 3 has 1 address that belongs to net 1. If the selected node-ip of master node 2 belongs to net 2, then when it will create a connection with any other master node, the socket address will be the address that belongs to net 1. Since etcd expects it to be the same as the node-ip, it will reject the connection.

This can be solved by node-ips selection that will not cause such a conflict. Node ips assignment should be done through ignition. To correctly set bootstrap ip, the machine-network for the cluster must be set to match the selected node-ip for that host.

MGMT-17701: Add capability to calculate none platform node-ips based on L3 connected addresses and connectivity

Calculate the node-ips for none platform cluster. Node ip calculation is actually calculation of the node-ip, hint, and cidr. Node-ip is the ip address that exists on the host that was selected as the node-ip. Hint is actually an IP address that does not exist on the host, but must belong to the subnet of the node-ip. Cidr is the subnet in cidr notation that the node-ip and hint belong to. Node-ip calculation is either done for all cluster hosts or none of them.

MGMT-17702: Modify ignition to use calculated node-ips for none platform

In order to set node-ip, ignition is modified. A file called /etc/default/nodeip-configuration contains the hint as was set by node-ip generation. In addition the bootstrap-ip is set in ignition as was set in node-ip generation. This is set as environment variable called "OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP".

MGMT-17703: Modify machine-network based on calculated node-ips

The etcd in boostrap uses he machine-network to set the IP address. Therefore during install-config generation the machine-network may be replaced by the cidr from the node-ip generation.

List all the issues related to this PR

  • [x] New Feature
  • [ ] Enhancement
  • [ ] Bug fix
  • [ ] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [ ] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [ ] None

How was this code tested?

  • [x] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [ ] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [ ] No tests needed

Checklist

  • [ ] Title and description added to both, commit and PR.
  • [ ] Relevant issues have been associated (see CONTRIBUTING guide)
  • [ ] This change does not require a documentation update (docstring, docs, README, etc)
  • [ ] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @tsorya /cc @paul-maidment

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 02 '24 13:05 openshift-ci-robot

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ori-amizur

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar May 02 '24 13:05 openshift-ci[bot]

Codecov Report

Attention: Patch coverage is 78.12500% with 42 lines in your changes are missing coverage. Please review.

Project coverage is 68.43%. Comparing base (4f4c464) to head (c2e4687).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6257      +/-   ##
==========================================
+ Coverage   68.25%   68.43%   +0.17%     
==========================================
  Files         244      245       +1     
  Lines       35869    36060     +191     
==========================================
+ Hits        24483    24676     +193     
+ Misses       9223     9205      -18     
- Partials     2163     2179      +16     
Files Coverage Δ
internal/common/common.go 23.90% <0.00%> (-0.20%) :arrow_down:
internal/ignition/installmanifests.go 55.67% <66.66%> (+0.27%) :arrow_up:
...ernal/network/none_platform_node_ips_allocation.go 80.72% <80.72%> (ø)

... and 3 files with indirect coverage changes

codecov[bot] avatar May 02 '24 13:05 codecov[bot]

/test edge-subsystem-kubeapi-aws

ori-amizur avatar May 02 '24 16:05 ori-amizur

@ori-amizur: This pull request references MGMT-17700 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

When none platform is in use, if there is ambiguity in node-ip assignment, then incorrect assignment might lead to installation failure. This happens when etcd detects that the socket address from an etcd node does not match the expected address in the peer certificate. In this case etcd rejects such connection.

Example: assuming two networks - net1 and net2. master node 1 has 1 address that belongs to net1. master node 2 has 2 addresses. one that belongs to net 1, and another that belongs to net 2 master node 3 has 1 address that belongs to net 1. If the selected node-ip of master node 2 belongs to net 2, then when it will create a connection with any other master node, the socket address will be the address that belongs to net 1. Since etcd expects it to be the same as the node-ip, it will reject the connection.

This can be solved by node-ips selection that will not cause such a conflict. Node ips assignment should be done through ignition. To correctly set bootstrap ip, the machine-network for the cluster must be set to match the selected node-ip for that host.

MGMT-17701: Add capability to calculate none platform node-ips based on L3 connected addresses and connectivity

Calculate the node-ips for none platform cluster. Node ip calculation is actually calculation of the node-ip, hint, and cidr. Node-ip is the ip address that exists on the host that was selected as the node-ip. Hint is actually an IP address that does not exist on the host, but must belong to the subnet of the node-ip. Cidr is the subnet in cidr notation that the node-ip and hint belong to. Node-ip calculation is either done for all cluster hosts or none of them.

MGMT-17702: Modify ignition to use calculated node-ips for none platform

In order to set node-ip, ignition is modified. A file called /etc/default/nodeip-configuration contains the hint as was set by node-ip generation. In addition the bootstrap-ip is set in ignition as was set in node-ip generation. This is set as environment variable called "OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP".

MGMT-17703: Modify machine-network based on calculated node-ips

The etcd in boostrap uses he machine-network to set the IP address. Therefore during install-config generation the machine-network may be replaced by the cidr from the node-ip generation.

List all the issues related to this PR

  • [x] New Feature
  • [ ] Enhancement
  • [ ] Bug fix
  • [ ] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [ ] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [ ] None

How was this code tested?

  • [x] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [ ] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [ ] No tests needed

Checklist

  • [ ] Title and description added to both, commit and PR.
  • [ ] Relevant issues have been associated (see CONTRIBUTING guide)
  • [ ] This change does not require a documentation update (docstring, docs, README, etc)
  • [ ] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @tsorya /cc @paul-maidment

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 08 '24 13:05 openshift-ci-robot

/test?

ori-amizur avatar May 08 '24 16:05 ori-amizur

@ori-amizur: The following commands are available to trigger required jobs:

  • /test e2e-agent-compact-ipv4
  • /test edge-assisted-operator-catalog-publish-verify
  • /test edge-ci-index
  • /test edge-e2e-ai-operator-ztp
  • /test edge-e2e-ai-operator-ztp-sno-day2-workers
  • /test edge-e2e-ai-operator-ztp-sno-day2-workers-late-binding
  • /test edge-e2e-metal-assisted
  • /test edge-e2e-metal-assisted-4-11
  • /test edge-e2e-metal-assisted-4-12
  • /test edge-e2e-metal-assisted-cnv
  • /test edge-e2e-metal-assisted-lvm
  • /test edge-e2e-metal-assisted-odf
  • /test edge-images
  • /test edge-lint
  • /test edge-subsystem-aws
  • /test edge-subsystem-kubeapi-aws
  • /test edge-unit-test
  • /test edge-verify-generated-code
  • /test images
  • /test mce-images

The following commands are available to trigger optional jobs:

  • /test e2e-agent-ha-dualstack
  • /test e2e-agent-sno-ipv6
  • /test edge-e2e-ai-operator-disconnected-capi
  • /test edge-e2e-ai-operator-ztp-3masters
  • /test edge-e2e-ai-operator-ztp-capi
  • /test edge-e2e-ai-operator-ztp-compact-day2-masters
  • /test edge-e2e-ai-operator-ztp-compact-day2-workers
  • /test edge-e2e-ai-operator-ztp-disconnected
  • /test edge-e2e-ai-operator-ztp-hypershift-zero-nodes
  • /test edge-e2e-ai-operator-ztp-multiarch-3masters-ocp
  • /test edge-e2e-ai-operator-ztp-multiarch-sno-ocp
  • /test edge-e2e-ai-operator-ztp-node-labels
  • /test edge-e2e-ai-operator-ztp-sno-day2-masters
  • /test edge-e2e-ai-operator-ztp-sno-day2-workers-ignitionoverride
  • /test edge-e2e-metal-assisted-4-13
  • /test edge-e2e-metal-assisted-4-14
  • /test edge-e2e-metal-assisted-4-15
  • /test edge-e2e-metal-assisted-bond
  • /test edge-e2e-metal-assisted-bond-4-14
  • /test edge-e2e-metal-assisted-day2
  • /test edge-e2e-metal-assisted-day2-arm-workers
  • /test edge-e2e-metal-assisted-day2-single-node
  • /test edge-e2e-metal-assisted-external
  • /test edge-e2e-metal-assisted-external-4-14
  • /test edge-e2e-metal-assisted-ipv4v6
  • /test edge-e2e-metal-assisted-ipv6
  • /test edge-e2e-metal-assisted-kube-api-late-binding-single-node
  • /test edge-e2e-metal-assisted-kube-api-late-unbinding-ipv4-single-node
  • /test edge-e2e-metal-assisted-kube-api-net-suite
  • /test edge-e2e-metal-assisted-mce-4-11
  • /test edge-e2e-metal-assisted-mce-4-12
  • /test edge-e2e-metal-assisted-mce-4-13
  • /test edge-e2e-metal-assisted-mce-4-14
  • /test edge-e2e-metal-assisted-mce-4-15
  • /test edge-e2e-metal-assisted-mce-sno
  • /test edge-e2e-metal-assisted-metallb
  • /test edge-e2e-metal-assisted-none
  • /test edge-e2e-metal-assisted-onprem
  • /test edge-e2e-metal-assisted-single-node
  • /test edge-e2e-metal-assisted-static-ip-suite
  • /test edge-e2e-metal-assisted-static-ip-suite-4-14
  • /test edge-e2e-metal-assisted-tang
  • /test edge-e2e-metal-assisted-tpmv2
  • /test edge-e2e-metal-assisted-upgrade-agent
  • /test edge-e2e-nutanix-assisted
  • /test edge-e2e-nutanix-assisted-2workers
  • /test edge-e2e-nutanix-assisted-4-14
  • /test edge-e2e-oci-assisted
  • /test edge-e2e-oci-assisted-4-14
  • /test edge-e2e-oci-assisted-iscsi
  • /test edge-e2e-vsphere-assisted
  • /test edge-e2e-vsphere-assisted-4-12
  • /test edge-e2e-vsphere-assisted-4-13
  • /test edge-e2e-vsphere-assisted-4-14
  • /test edge-e2e-vsphere-assisted-umn
  • /test okd-scos-images
  • /test push-pr-image

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-assisted-service-master-e2e-agent-compact-ipv4
  • pull-ci-openshift-assisted-service-master-edge-ci-index
  • pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-external
  • pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-none
  • pull-ci-openshift-assisted-service-master-edge-e2e-nutanix-assisted
  • pull-ci-openshift-assisted-service-master-edge-e2e-nutanix-assisted-4-14
  • pull-ci-openshift-assisted-service-master-edge-e2e-oci-assisted
  • pull-ci-openshift-assisted-service-master-edge-e2e-oci-assisted-4-14
  • pull-ci-openshift-assisted-service-master-edge-e2e-vsphere-assisted
  • pull-ci-openshift-assisted-service-master-edge-images
  • pull-ci-openshift-assisted-service-master-edge-lint
  • pull-ci-openshift-assisted-service-master-edge-subsystem-aws
  • pull-ci-openshift-assisted-service-master-edge-subsystem-kubeapi-aws
  • pull-ci-openshift-assisted-service-master-edge-unit-test
  • pull-ci-openshift-assisted-service-master-edge-verify-generated-code
  • pull-ci-openshift-assisted-service-master-images
  • pull-ci-openshift-assisted-service-master-mce-images

In response to this:

/test?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar May 08 '24 16:05 openshift-ci[bot]

/test edge-e2e-metal-assisted-none

ori-amizur avatar May 08 '24 16:05 ori-amizur

/test edge-e2e-metal-assisted

ori-amizur avatar May 08 '24 16:05 ori-amizur

@ori-amizur: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-vsphere-assisted a63c03f8b2f3458fce13ad2ed5e68dccabe5eb0f link false /test edge-e2e-vsphere-assisted
ci/prow/edge-e2e-oci-assisted a63c03f8b2f3458fce13ad2ed5e68dccabe5eb0f link false /test edge-e2e-oci-assisted
ci/prow/edge-e2e-oci-assisted-4-14 a63c03f8b2f3458fce13ad2ed5e68dccabe5eb0f link false /test edge-e2e-oci-assisted-4-14
ci/prow/edge-e2e-nutanix-assisted a63c03f8b2f3458fce13ad2ed5e68dccabe5eb0f link false /test edge-e2e-nutanix-assisted
ci/prow/edge-e2e-nutanix-assisted-4-14 a63c03f8b2f3458fce13ad2ed5e68dccabe5eb0f link false /test edge-e2e-nutanix-assisted-4-14

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar May 08 '24 18:05 openshift-ci[bot]

/retest-required

ori-amizur avatar May 23 '24 06:05 ori-amizur

/retest-required

ori-amizur avatar May 23 '24 20:05 ori-amizur

/lgtm

tsorya avatar May 24 '24 14:05 tsorya

/retest-required

Remaining retests: 0 against base HEAD 4f4c46493887bed73e88b5661f4f2a503d5ccf35 and 2 for PR HEAD 7425516c2a60d230159a3a4f3addd30c2aca2e56 in total

openshift-ci-robot avatar May 24 '24 14:05 openshift-ci-robot

/retest-required

ori-amizur avatar May 24 '24 21:05 ori-amizur

/test edge-e2e-ai-operator-ztp

ori-amizur avatar May 25 '24 08:05 ori-amizur

@ori-amizur: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-nutanix-assisted c2e46879bb54d033db33ca8ee65147f21b2f8037 link false /test edge-e2e-nutanix-assisted
ci/prow/edge-e2e-nutanix-assisted-4-14 c2e46879bb54d033db33ca8ee65147f21b2f8037 link false /test edge-e2e-nutanix-assisted-4-14
ci/prow/edge-e2e-oci-assisted c2e46879bb54d033db33ca8ee65147f21b2f8037 link false /test edge-e2e-oci-assisted
ci/prow/edge-e2e-vsphere-assisted c2e46879bb54d033db33ca8ee65147f21b2f8037 link false /test edge-e2e-vsphere-assisted

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar May 25 '24 11:05 openshift-ci[bot]

/lgtm

tsorya avatar May 27 '24 09:05 tsorya

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-agent-installer-api-server-container-v4.17.0-202405271211.p0.g8a63485.assembly.stream.el9 for distgit ose-agent-installer-api-server. All builds following this will include this PR.

openshift-bot avatar May 27 '24 12:05 openshift-bot