infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

EPIC: Upgrade EOL Ubuntu Machines To New 24.04 LTS Version

Open steelhead31 opened this issue 1 year ago • 5 comments

Carry out an upgrade on all EOL Ubuntu machines to the most recent LTS release ( 24.04 ), and also upgrade some of the existing Ubuntu 20.04 & 22.04 machines to provider greater coverage.

Machines Identified:

  • ubuntu1604-x64-1: {ip: 78.47.239.96, description: nagios.adoptopenjdk.net}
  • vagrant-x64-1: {ip: 150.239.60.120, description: Bare metal machine to run vagrantPlaybookCheck and qemuPlaybookCheck}
  • [x] https://github.com/adoptium/infrastructure/issues/3501
  • [x] https://github.com/adoptium/infrastructure/issues/3589
  • [x] https://github.com/adoptium/infrastructure/issues/3598
  • [x] https://github.com/adoptium/infrastructure/issues/3693
  • [x] https://github.com/adoptium/infrastructure/issues/3692
  • [x] https://github.com/adoptium/infrastructure/issues/3589
  • [x] https://github.com/adoptium/infrastructure/issues/3577
  • [x] https://github.com/adoptium/infrastructure/issues/3729

steelhead31 avatar Jun 10 '24 13:06 steelhead31

Other machines that should be upgraded as part of this:

Host Current OS Status
ci.adoptium.net (Primary jenkins server) Ubuntu 20.04 Separate issue
dockerhost-azure-ubuntu2204-x64-2 Ubuntu 22.04 [§]
dockerhost-equinix-ubuntu2004-armv8-1 Ubuntu 20.04 [§]
dockerhost-osuosl-ubutu2004-ppc64le-1 Ubuntu 20.04 [§]
test-ibmcloud-ubuntu1604-x64-1 Ubuntu 16.04 [†]
test-osuosl-ubuntu1604-ppc64le-1 Ubuntu 16.04 [†]
test-osuosl-ubuntu1604-ppc64le-2 Ubuntu 16.04 [†]
test-osuosl-ubuntu1804-ppc64le-1 Ubuntu 18.04 [†]
test-osuosl-ubuntu1804-ppc64le-2 Ubuntu 18.04 [†]
test-skytap-ubuntu2004-ppc64le-1 Ubuntu 20.04
test-osuosl-ubuntu2004-ppc64le-1 Ubuntu 20.04

[§] - Updating the dockerhosts to 24.04 will mean that the kernel will be suitable for any newer docker containers that we wish to run. [†] - These machines are running a version which is now out of standard support.

sxa avatar Jun 11 '24 09:06 sxa

ref Upgrade/Rebuild IBM VPC Host To Ubuntu 24.04

I managed to get the VPC machine, 150.239.60.120, into a state where it is receiving updates, but I am hitting dependency errors. Using apt --fix-broken install

******************************************************************************
*
* The base-files package cannot be installed because
* /bin is a directory, but should be a symbolic link.
*
* Please install the usrmerge package to convert this system to merged-/usr.
*
* For more information please read https://wiki.debian.org/UsrMerge.
*
******************************************************************************


dpkg: error processing archive /var/cache/apt/archives/base-files_13.5_amd64.deb (--unpack):
 new base-files package pre-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 /var/cache/apt/archives/base-files_13.5_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Then I try apt install usrmerge to solve this, but I again hit the dependency errors, sort of like a cycle. I recommend that the machine be rebuilt

Haroon-Khel avatar Sep 12 '24 15:09 Haroon-Khel

@AdamBrousseau are you able to have this machine reinstalled with a Ubuntu2404 image?

Haroon-Khel avatar Sep 12 '24 15:09 Haroon-Khel

@AdamBrousseau are you able to have this machine reinstalled with a Ubuntu2404 image?

FYI @AdamBrousseau has reinstalled the ibvmcloud vagrant host and once he's got the keys on it (I've asked him to put mine on then I'll add the others) we'll be able to do the setup.

sxa avatar Oct 07 '24 14:10 sxa

FYI @AdamBrousseau has reinstalled the ibvmcloud vagrant host and once he's got the keys on it (I've asked him to put mine on then I'll add the others) we'll be able to do the setup.

Done

AdamBrousseau avatar Oct 07 '24 15:10 AdamBrousseau

Noting that in addition to the list in the earlier comment there are ten test-docker machines which are Ubuntu 20.04.

I've added test-osuosl-ubuntu2004-ppc64le-1 and test-skytap-ubuntu2004-ppc64le-1 to the list above too.

The odroid ones on Ubuntu 20.04 will hopefully become irrelevant when https://github.com/adoptium/infrastructure/issues/3043 is closed.

sxa avatar Feb 05 '25 11:02 sxa

This is being blocked by https://github.com/adoptium/infrastructure/issues/3547 (for the test-docker nodes only) since the untarr error is preventing us from upgrading our test-docker ubuntu 2004 to 2404 on ppc64le and arm32. In https://github.com/adoptium/infrastructure/issues/3547#issuecomment-2649063527 I have recommended upgrading the docker engine to >= 25.0.3, so I will try that to see if it fixes the issue

Haroon-Khel avatar Feb 10 '25 19:02 Haroon-Khel

I have recommended upgrading the docker engine to >= 25.0.3, so I will try that to see if it fixes the issue

Which OSs are the problematic docker host systems running?

The default docker.io package even back to Ubuntu 20.04 seems to be Docker version 26.1.3, build 26.1.3-0ubuntu1~20.04.1 which should meet that requirement. We may have blocked regular updates to that package though to prevent it automatically causing an outage on the containers (since I seem to recall that has happened in the past)

sxa avatar Feb 10 '25 19:02 sxa

Which OSs are the problematic docker host systems running?

dockerhost-osuosl-ubuntu2404-ppc64le-1 and dockerhost-skytap-ubuntu2004-ppc64le-1 (needs to be upgraded to ubuntu 2404 anyway) are running

Server:
 Engine:
  Version:          24.0.7

dockerhost-skytap-ubuntu2004-ppc64le-1 needs its libseccomp2 upgraded to >= 2.5.5 as it is suspected that this is also causing the tar error in https://github.com/adoptium/infrastructure/issues/3547

While the problematic arm64 dockerhost, dockerhost-equinix-ubuntu2204-armv8-1, though its running a docker >= 27, its libseccomp2 needs to be upgraded to >= 2.5.5, more details in https://github.com/adoptium/infrastructure/issues/3547#issuecomment-2649087432

Haroon-Khel avatar Feb 10 '25 20:02 Haroon-Khel

Yep see my suggestion on libseccomp. It works be nice if that wasn't a blocker.

We'll need to see why the machines locked at an earlier version are stuck there but hopefully we can just have an outage and manually update them but we should also check if it's not at a later one because we've stopped it updating our some other reason

sxa avatar Feb 10 '25 21:02 sxa

All of the test-docker-ubuntu2004 nodes have been upgraded to ubuntu 2404

Haroon-Khel avatar Feb 11 '25 15:02 Haroon-Khel

Reiterating https://github.com/adoptium/infrastructure/issues/3547#issuecomment-2663722626

Attempted the OS upgrade on dockerhost-skytap-ubuntu2004-ppc64le-1, it stopped midway due to a lack of diskspace on /. Im unable to increase the size of /. A solution could be to recreate this vm in the skytap console

EDIT: I think the skytap console allows me to increase / while the machine is offline

Haroon-Khel avatar Feb 17 '25 17:02 Haroon-Khel

Ive increased the / space to 100G, but now it is specifically requesting more /boot space

Not enough free disk space 

The upgrade has aborted. The upgrade needs a total of 132 M free 
space on disk '/boot'. Please free at least an additional 68.4 M of 
disk space on '/boot'. You can remove old kernels using 'sudo apt 
autoremove' and you could also set COMPRESS=xz in 
/etc/initramfs-tools/initramfs.conf to reduce the size of your 
initramfs. 

Haroon-Khel avatar Feb 18 '25 11:02 Haroon-Khel

Ive removed some of the older kernel files in /boot with apt purge to relieve space in /boot. Upgrade to ubuntu 22.04 (on the way to 22.04) is underway

Haroon-Khel avatar Feb 18 '25 12:02 Haroon-Khel

Upgraded to ubuntu 22.04, but now need more space on /boot for the 24.04 upgrade

The upgrade has aborted. The upgrade needs a total of 179 M free 
space on disk '/boot'. Please free at least an additional 146 M of 
disk space on '/boot'. You can remove old kernels using 'sudo apt 
autoremove' and you could also set COMPRESS=xz in 
/etc/initramfs-tools/initramfs.conf to reduce the size of your 
initramfs. 

Haroon-Khel avatar Feb 18 '25 13:02 Haroon-Khel

At this point I am comfortable keeping dockerhost-skytap-ubuntu2004-ppc64le-1 on ubuntu 22.04, since it is still in support. Its docker version has been upgraded to 26 and the machine is no longer suffering from the issues in https://github.com/adoptium/infrastructure/issues/3547

Haroon-Khel avatar Feb 18 '25 14:02 Haroon-Khel

Im going to proceed with the upgrades of the following machines to ubuntu 2404

test-osuosl-ubuntu1604-ppc64le-1 test-osuosl-ubuntu1604-ppc64le-2 test-osuosl-ubuntu1804-ppc64le-1 test-osuosl-ubuntu1804-ppc64le-2 test-osuosl-ubuntu2004-ppc64le-1

Haroon-Khel avatar Feb 19 '25 12:02 Haroon-Khel

Unfortunately these are all Power 8 machines, which cannot be upgraded passed ubuntu 20.04. These machines will need to be recreated in the osuosl power console with power 9 cpus

Haroon-Khel avatar Feb 19 '25 12:02 Haroon-Khel

https://ci.adoptium.net/computer/test-osuosl-ubuntu2404-ppc64le-2/ is replacing test-osuosl-ubuntu1604-ppc64le-1 AQA test pipeline https://ci.adoptium.net/job/AQA_Test_Pipeline/395/console

Haroon-Khel avatar Feb 19 '25 15:02 Haroon-Khel

test-osuosl-ubuntu1604-ppc64le-2 does not exist, perhaps it was deleted from the console and the inventory file was not updated. So I wont be replacing it with another machine

https://ci.adoptium.net/computer/test-osuosl-ubuntu2404-ppc64le-3/ replaces test-osuosl-ubuntu1804-ppc64le-1 https://ci.adoptium.net/job/AQA_Test_Pipeline/398/console

Haroon-Khel avatar Feb 19 '25 16:02 Haroon-Khel

https://ci.adoptium.net/computer/test-osuosl-ubuntu2404-ppc64le-4/ replaces test-osuosl-ubuntu1804-ppc64le-2 https://ci.adoptium.net/job/AQA_Test_Pipeline/399/console

Haroon-Khel avatar Feb 19 '25 16:02 Haroon-Khel

https://ci.adoptium.net/computer/test-osuosl-ubuntu2404-ppc64le-5/ replaces test-osuosl-ubuntu2004-ppc64le-1 https://ci.adoptium.net/job/AQA_Test_Pipeline/400/console

Thats all of the OSUOSL ppc64le machines

Haroon-Khel avatar Feb 20 '25 13:02 Haroon-Khel

https://ci.adoptium.net/computer/test-skytap-ubuntu2404-ppc64le-1 replaces test-skytap-ubuntu2004-ppc64le-1

https://ci.adoptium.net/job/AQA_Test_Pipeline/401/console

Haroon-Khel avatar Feb 24 '25 18:02 Haroon-Khel

I think that's all of the machines in the list https://github.com/adoptium/infrastructure/issues/3588#issuecomment-2160182456

Haroon-Khel avatar Feb 24 '25 18:02 Haroon-Khel

I'll close this issue once https://github.com/adoptium/infrastructure/pull/3883 is merged

Haroon-Khel avatar Feb 24 '25 18:02 Haroon-Khel