JetSki icon indicating copy to clipboard operation
JetSki copied to clipboard

upstream rebase and fixed hostname issue

Open mukrishn opened this issue 11 months ago • 8 comments

Description

Rebased from upstream openshift-kni/baremetal-deploy

fix to disable hostname from lab dhcp server and public interface during first boot - slack thread

Fixes # (issue)

Added a new nmstate config for day-1 installation - link

Added a networkData secret in baremetalhost resource in day-2 scaling playbook, config

Please select the appropriate options:

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update
  • [ ] This change is a documentation update

Testing

  • [x] 4.14.14
  • [x] 4.14.18

Test Configuration:

  • Versions: 4.14.14, 4.14.18
  • Lab: scale
  • Network: singlestack IPv4
  • Hardware: FC640, R640

Checklist

  • [ ] I have performed a self-review of my own code
  • [ ] If a change is adding a feature, it should require a change to the README.md and the review should catch this.
  • [ ] If the change is a fix, it should have an issue. The review should make sure the comments state the issue (not just the number) and it should use the keywords that will close the issue on merge.
  • [ ] A change should not be merged unless it passes CI or there is a comment/update saying what testing was passed.
  • [ ] PRs should not be merged unless positively reviewed.

mukrishn avatar Mar 05 '24 20:03 mukrishn

Tested on R650, had some lab issue but able to deploy all control plane nodes successfully.

$ oc get nodes
NAME                                        STATUS   ROLES                         AGE     VERSION
f04-h09-000-r640.rdu2.scalelab.redhat.com   Ready    control-plane,master,worker   7h18m   v1.27.10+28ed2d7
f04-h10-000-r640.rdu2.scalelab.redhat.com   Ready    control-plane,master,worker   7h18m   v1.27.10+28ed2d7
f04-h11-000-r640.rdu2.scalelab.redhat.com   Ready    control-plane,master,worker   7h18m   v1.27.10+28ed2d7

mukrishn avatar Mar 14 '24 04:03 mukrishn

tested deployment with updated config

$ oc get nodes 
NAME                                        STATUS   ROLES                         AGE     VERSION
master-0                                    Ready    control-plane,master,worker   14h     v1.27.10+28ed2d7
master-1                                    Ready    control-plane,master,worker   14h     v1.27.10+28ed2d7
master-2                                    Ready    control-plane,master,worker   14h     v1.27.10+28ed2d7

scale worker is partially tested, need a lab allocation to test it thoroughly.

mukrishn avatar Mar 24 '24 22:03 mukrishn

tested this on FC640s, thanks @wilsondav for the lab env.

$ oc get nodes
NAME              STATUS   ROLES                  AGE     VERSION
master-0          Ready    control-plane,master   47m     v1.27.10+28ed2d7
master-1          Ready    control-plane,master   48m     v1.27.10+28ed2d7
master-2          Ready    control-plane,master   47m     v1.27.10+28ed2d7
worker000-fc640   Ready    worker                 11m     v1.27.10+28ed2d7
worker001-fc640   Ready    worker                 11m     v1.27.10+28ed2d7
worker002-fc640   Ready    worker                 11m     v1.27.10+28ed2d7
worker003-fc640   Ready    worker                 8m38s   v1.27.10+28ed2d7

mukrishn avatar Mar 26 '24 18:03 mukrishn

@josecastillolema @wilsondav please review

mukrishn avatar Mar 28 '24 13:03 mukrishn

Thanks @mukrishn , will validate the PR in the small VCP env.

josecastillolema avatar Mar 28 '24 19:03 josecastillolema

@wilsondav can you please paste here the errors you had with this PR in cloud18 and cloud26?

josecastillolema avatar Apr 19 '24 16:04 josecastillolema

Regarding the fixed hostname issue, it looks like fresh installs works fine but scale ups lack the fix, i.e.:

e23-h24-b03-fc640.rdu2.scalelab.redhat.com   Ready    worker                 3h41m   v1.27.11+749fe1d
e23-h24-b04-fc640.rdu2.scalelab.redhat.com   Ready    worker                 3h40m   v1.27.11+749fe1d
master-0                                     Ready    control-plane,master   5h53m   v1.27.11+749fe1d
master-1                                     Ready    control-plane,master   5h53m   v1.27.11+749fe1d
master-2                                     Ready    control-plane,master   5h52m   v1.27.11+749fe1d
worker000-fc640                              Ready    worker                 5h16m   v1.27.11+749fe1d
worker001-fc640                              Ready    worker                 5h16m   v1.27.11+749fe1d

Could the PR be split into two? One for the upstream rebase and another one for the fixed hostname issue?

Thanks

josecastillolema avatar Apr 22 '24 08:04 josecastillolema

@josecastillolema PR #307 is rebase

mukrishn avatar Apr 22 '24 14:04 mukrishn