jetlag icon indicating copy to clipboard operation
jetlag copied to clipboard

Adding Scale Out functionality

Open radez opened this issue 9 months ago • 10 comments

  • Add nodes to worker inventory section and update vars in scaleout.yml to add nodes to the existing cluster.
  • https://docs.openshift.com/container-platform/4.17/nodes/nodes/nodes-nodes-adding-node-iso.html

radez avatar Feb 20 '25 13:02 radez

/test ?

josecastillolema avatar Feb 20 '25 15:02 josecastillolema

@josecastillolema: The following commands are available to trigger required jobs:

/test deploy-5nodes
/test deploy-5nodes-dev
/test deploy-sno
/test deploy-sno-dev

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Feb 20 '25 15:02 openshift-ci[bot]

/test deploy-sno

josecastillolema avatar Feb 20 '25 15:02 josecastillolema

/test deploy-5nodes

akrzos avatar Feb 24 '25 19:02 akrzos

/test deploy-5nodes

The test failed because of:

   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found

Let me take care of this tomorrow morning, we need to update the secrets. ps. Even with the updated secrets we will have the route issue :/ cc @akrzos

josecastillolema avatar Feb 24 '25 19:02 josecastillolema

/test deploy-5nodes

The test failed because of:

   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found

Let me take care of this tomorrow morning, we need to update the secrets. ps. Even with the updated secrets we will have the route issue :/ cc @akrzos

Ok I was going to look into the route issue more this afternoon also.

akrzos avatar Feb 24 '25 19:02 akrzos

/test deploy-5nodes

The test failed because of:

   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found

Let me take care of this tomorrow morning, we need to update the secrets. ps. Even with the updated secrets we will have the route issue :/ cc @akrzos

Ok I was going to look into the route issue more this afternoon also.

Should be fixed when https://github.com/openshift/release/pull/62015 merges

josecastillolema avatar Feb 25 '25 11:02 josecastillolema

/test deploy-5nodes

akrzos avatar Feb 25 '25 19:02 akrzos

/test deploy-5nodes

akrzos avatar Feb 27 '25 20:02 akrzos

@radez: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/deploy-sno 7bb5d2be748fe377d204e8a588e0b367fa5ecc19 link true /test deploy-sno
ci/prow/deploy-5nodes 7bb5d2be748fe377d204e8a588e0b367fa5ecc19 link true /test deploy-5nodes

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Feb 27 '25 20:02 openshift-ci[bot]

This needs to be rebased to pick up the fix in #619 for CI to work

akrzos avatar Feb 27 '25 21:02 akrzos

Turns out we do need all.yml, there's a config director var that I used to hold the generated iso in.

radez avatar Mar 18 '25 14:03 radez

Was able to run an initial deployment with 3 nodes and then scale up to 6 nodes:

# oc get no
NAME               STATUS   ROLES                  AGE     VERSION
e38-h02-000-r650   Ready    control-plane,master   35m     v1.31.6
e38-h03-000-r650   Ready    control-plane,master   52m     v1.31.6
e38-h06-000-r650   Ready    control-plane,master   52m     v1.31.6
vm00001            Ready    worker                 38m     v1.31.6
vm00002            Ready    worker                 38m     v1.31.6
vm00003            Ready    worker                 38m     v1.31.6
vm00004            Ready    worker                 4m37s   v1.31.6
vm00005            Ready    worker                 4m39s   v1.31.6
vm00006            Ready    worker                 4m40s   v1.31.6

Basic process was:

  1. Deploy with worker_node_count: 0 and hybrid_worker_count: 3
  2. Rerun create-inventory playbook with hybrid_worker_count: 6
  3. Copy the sample scale up vars and edit ansible/vars/scale_out.yml
  4. Run mno-scale-out.yml

Playbook ran for 10m 28s for the 3 node scale up.

akrzos avatar Mar 19 '25 19:03 akrzos

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akrzos

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Mar 19 '25 21:03 openshift-ci[bot]

I think it would be nice to have a Prow test for this feature in the Jetlag CI, it can deploy a 3+1 cluster and then scale out +1 node.

josecastillolema avatar Mar 20 '25 08:03 josecastillolema