xcat-core icon indicating copy to clipboard operation
xcat-core copied to clipboard

Ubuntu 20.04 support

Open jabl opened this issue 4 years ago • 31 comments

Ubuntu 20.04 LTS was released today, and we're eager to see support for it in xcat.

Unfortunately it seems they have ripped out the old debian-installer so it needs more work than merely updating the preseed file, as preseeds are no longer supported. Instead they have a new installation system called "autoinstall", see https://wiki.ubuntu.com/FoundationsTeam/AutomatedServerInstalls

jabl avatar Apr 23 '20 20:04 jabl

The xCAT core team will add Ubuntu 20.04 LTS to our roadmap planning discussions, but we have no guidance on when to expect this feature to be added at this time. Contributions from xCAT community users could accelerate availability of Ubuntu 20.04 LTS in xCAT.

besawn avatar Apr 30 '20 12:04 besawn

Hi, is there a way to install xcat on ubuntu 20.04? Unfortunately, we only have Ubuntu 20.04 as OS version available.

Hoeze avatar Aug 29 '20 23:08 Hoeze

ubuntu20 has live-server iso and has some difference than prior version of ubuntu. vmlinuz and initrd located in the casper dir. after modification for location of kernel and initrd, the provision is stucked here:

/init: line 49: can't open /dev/sr0: No medium found
/init: line 49: can't open /dev/sr0: No medium found
Unable to find a medium container a live file system
Attempt interactive netboot from a URL?
yes no (default yes): yes
Two methods available for IP configuration:
  * static: for static IP configuration
  * dhcp: for automatic IP configuration
static dhcp (default 'dhcp'):

some other files may need to changes for this to work. need to spend more time to investigate this issue.

cxhong avatar Aug 31 '20 14:08 cxhong

Ah, I was talking about xcat-server running on Ubuntu 20.04, not about deployment. Should I open another issue for this?

Hoeze avatar Aug 31 '20 14:08 Hoeze

oh, how did u install Ubuntu 20.04 system?

Have you try to install xCAT on Ubuntu 20.04? what kind of messages did u get?

cxhong avatar Aug 31 '20 15:08 cxhong

@cxhong I tried installing via the instructions on the docs: https://xcat-docs.readthedocs.io/en/stable/guides/install-guides/apt/automatic_install.html

However, there is simply no repository for focal, bionic is the latest: http://xcat.org/files/xcat/repos/apt/latest/xcat-core/dists/

Hoeze avatar Aug 31 '20 15:08 Hoeze

oh, I was wondering how did u install ubuntu 20.04 to the system? not from ubuntu 20.04 iso file, right?

To install xCAT, you may try https://xcat-docs.readthedocs.io/en/stable/guides/install-guides/apt/configure_xcat.html , using bionic instead of focal. or try to make symbolic link

cxhong avatar Aug 31 '20 15:08 cxhong

Ah I see. The ubuntu 20.04 was installed by our IT administration. It's a virtual machine. I can only use it as-is.

Thanks for the hint, I can try this. I would prefer an officially supported repository for focal though :)

Hoeze avatar Aug 31 '20 15:08 Hoeze

Hi, are there any updates on this?

Hoeze avatar Nov 25 '20 17:11 Hoeze

Hi! We're having similar problems. Are there any updates on the support for Ubuntu 20.04? Are there any estimations for a time frame for it?

diegoarmino avatar Feb 04 '21 18:02 diegoarmino

ubuntu20 has live-server iso and has some difference than prior version of ubuntu. vmlinuz and initrd located in the casper dir. after modification for location of kernel and initrd, the provision is stucked here:

/init: line 49: can't open /dev/sr0: No medium found
/init: line 49: can't open /dev/sr0: No medium found
Unable to find a medium container a live file system
Attempt interactive netboot from a URL?
yes no (default yes): yes
Two methods available for IP configuration:
  * static: for static IP configuration
  * dhcp: for automatic IP configuration
static dhcp (default 'dhcp'):

some other files may need to changes for this to work. need to spend more time to investigate this issue.

Hi, I get stucked here also. Do you know how to fix it now?

JeffLee1874 avatar Mar 16 '21 01:03 JeffLee1874

The xCAT core team does not currently have the resources to add Ubuntu 20.04 support to the project roadmap.

There is interest in Ubuntu 20.04 in the xCAT user community, so I would encourage those in the community that are interested in this support to share information with each other on specific problems that have been discovered and how they can be resolved. With enough community contributions in the form of issue documentation with resolution and eventually pull requests, Ubuntu 20.04 support in xCAT can move forward.

Currently xCAT cannot successfully deploy Ubuntu 20.04, so please plan accordingly.

besawn avatar Mar 16 '21 13:03 besawn

I'm digging into what's needed to get Ubuntu 20.04 deployable, at least for installs (I don't use network boot, so I'm not sure I should be messing with updating that). It's significantly different to older versions since it's using a completely new installer, but I've been able to get things to work with manual workarounds, and I have a decent idea what needs to be changed in the xCAT_plugin/debian.pm code.

I'm in the process of getting the necessary paperwork sorted with my employer, and hopefully I'll be able to start submitting pull requests soon.

sjjf avatar May 12 '21 06:05 sjjf

@sjjf Thank you for any assistance you can provide.

besawn avatar May 12 '21 11:05 besawn

@besawn I've just submitted a draft pull request with my first cut of the implementation - #6975

I'd appreciate an xCAT dev looking over it to let me know if I'm going in completely the wrong direction or not - particularly with the question of whether or not I can assume that the pkgdir will be NFS exported, and thus usable for the install.

sjjf avatar May 13 '21 06:05 sjjf

Ubuntu 20.04 LTS was released today, and we're eager to see support for it in xcat.

Unfortunately it seems they have ripped out the old debian-installer so it needs more work than merely updating the preseed file, as preseeds are no longer supported. Instead they have a new installation system called "autoinstall", see https://wiki.ubuntu.com/FoundationsTeam/AutomatedServerInstalls

As workaround for now to install ubuntu 20.04 you can use http://cdimage.ubuntu.com/ubuntu-legacy-server/releases/20.04/release/ubuntu-20.04.1-legacy-server-amd64.iso ISO. Debian-installer is supported by this iso. I only tested basic provisioning, but so far it's working. xcat version: 2.15.1

0megam avatar Sep 04 '22 14:09 0megam

@omegarus as of 2.16.4 you should be able to deploy a standard Ubuntu 20.04 (or later - I haven't tested newer versions myself yet). The subiquity based installer used by newer Ubuntu is supported, though you'll need to make sure you update any installation templates to use the new cloud-init-alike configuration system rather than the debian-installer preseed configuration system.

sjjf avatar Sep 06 '22 00:09 sjjf

@sjjf I understand that, but In order to move faster to the new OS, without need of changing templates, we had to use legacy way. In time I will migrate to subiquity based installer. Just wanted to mention this workaround here for people who can't upgrade to the latest Xcat.

0megam avatar Sep 06 '22 16:09 0megam

@omegarus Were you successful with diskfull or diskless install of ubuntu-20.04.1-legacy-server ?

gurevichmark avatar Sep 06 '22 18:09 gurevichmark

@gurevichmark I used it for diskfull installation. If you are using EFI you need to update elilo Xcat package to latest one from version 2.16: https://xcat.org/files/xcat/repos/apt/2.16/xcat-dep/pool/main/e/elilo-xcat/elilo-xcat_3.14-6_all.deb

0megam avatar Sep 07 '22 07:09 0megam

@omegarus Which template did you use for osimage definition, /opt/xcat/share/xcat/install/ubuntu/compute.tmpl? And what did you include in the pkglist ?

When I try to provision a VM with ubuntu-20.04.1-legacy-server, it gets stuck here:

   ┌───────────────────┤ [!!] Finish the installation ├────────────────────┐
   │                                                                       │
   │                    Failed to run preseeded command                    │
   │ Execution of preseeded command "mkdir -p /target/var/log/xcat/; { cat │
   │ /tmp/pre-install.log >> /target/var/log/xcat/xcat.log; echo "Running  │
  ┌│ preseeding late_command Installation script..."; wget http://`cat     │
  ││ /tmp/xcatserver`:80/install/autoinst/c910f04x12v07.post; chmod u+x    │
  ││ c910f04x12v07.post; cp ./c910f04x12v07.post /target/root/post.script; │
  ││ mount -o bind /proc /target/proc -t proc; mount -o bind /dev          │
  ││ /target/dev; mount -o bind /dev/pts /target/dev/pts -t devpts; mount  │
  ││ -o bind /sys /target/sys; chroot /target /root/post.script; cp        │
  └│ /target/etc/network/interfaces /etc/network/interfaces; }             │
   │ >>/target/var/log/xcat/xcat.log 2>&1" failed with exit code 1.        │
   │                                                                       │
   │                              <Continue>                               │
   │                                                                       │
   └───────────────────────────────────────────────────────────────────────┘

The problem seems to be in the line cp /target/etc/network/interfaces /etc/network/interfaces; from this entry in /install/autoinst/<node> :

d-i preseed/late_command string \
     mkdir -p /target/var/log/xcat/; \
     { \
     cat /tmp/pre-install.log >> /target/var/log/xcat/xcat.log; \
     echo "Running preseeding late_command Installation script..."; \
     wget http://`cat /tmp/xcatserver`:80/install/autoinst/c910f04x12v07.post; \
     chmod u+x c910f04x12v07.post; \
     cp ./c910f04x12v07.post /target/root/post.script; \
     mount -o bind /proc /target/proc -t proc; \
     mount -o bind /dev /target/dev; \
     mount -o bind /dev/pts /target/dev/pts -t devpts; \
     mount -o bind /sys /target/sys; \
     chroot /target /root/post.script; \
     cp /target/etc/network/interfaces /etc/network/interfaces; \
     } >>/target/var/log/xcat/xcat.log 2>&1

It gets logged into var/log/xcat/xcat.log as: cp: can't stat '/target/etc/network/interfaces': No such file or directory

If I remove that line, the VM boots into Ubuntu 20.04.1:

root@c910f04x12v07:~# hostnamectl
   Static hostname: c910f04x12v07
         Icon name: computer-vm
           Chassis: vm
        Machine ID: c9095f1b8ea2d0927cd2762b631a0e41
           Boot ID: 03672f6c7034464ba2f12f851dbb6125
    Virtualization: kvm
  Operating System: Ubuntu 20.04.1 LTS
            Kernel: Linux 5.4.0-125-generic
      Architecture: x86-64
root@c910f04x12v07:~#

gurevichmark avatar Sep 07 '22 19:09 gurevichmark

Are there any updates on using xCat to deploy Ubuntu 20.04(+?) client systems? (Our deployment server is CentOS.)

Besides the tweak to /install/autoinst/<node> that @gurevichmark needed to do, are there gotchas that prevent the use of at least the legacy-server image from working?

@sjjf stated in one comment that the "subiquity based installer used by newer Ubuntu is supported". Is there any kind of documentation / weblog posts on this?

dmagdavector avatar Apr 17 '23 14:04 dmagdavector

Hello,

I too am interested to know if Ubuntu 20.04 and 22.04 support has got any further.

OCF are seeing an increasing number of clients wanting Ubuntu provisioned nodes for NVIDIA AI/ML stacks or are simply following software vendor decisions to abandon CentOS and EL based operating systems in favour of Ubuntu. Whenever Ubuntu is mandatory we are having to pitch an alternative to xCAT that is commonly a commercial product and has zero integration with any of our own developed stack. Support for modern Ubuntu and Debian operating systems would be a huge tick towards retaining xCAT long term.

I can see some of the work has already been done as a starting point and this might be something I can pick up later in the year if there has been no movement and our road map takes me there.

Regards, Matt.

ocfmatt avatar Jun 02 '23 09:06 ocfmatt

@ocfmatt I'm currently using 2.16.5 (2.16.5-snap202303030906 on an Ubuntu Bionic node) to deploy Ubuntu 20.04 and 22.04 nodes using the Subiquity installer without any issues - I'm not using the template that ships with xcat-server (share/xcat/install/ubuntu/compute.subiquity.tmpl), but with a completely custom template (see the end of the comment for more details) the installation works for me reliably and without issues.

I'm not in a great position to test out the default template, since I tend to do builds in bursts - I'm currently in a lull, and having a look at it I suspect it'd need some work to behave sensibly, mostly around the automated partitioning (which is always the fiddly thing with xCAT builds, from my experience). I really don't have the time or available test hardware to do the work that would be needed to figure out all the wrinkles around the automated partitioning, but to be honest I think the sensible option for most sites would be to invest a bit of time finding a partitioning scheme that worked with their hardware and use that rather than the shipped templates.

The other thing that ends up being a pain is that newer versions of Ubuntu (and probably other distros) ship with different/updated default software stacks that can trigger issues with the supplied scripts. For example, I ran into a problem with the sshd_config created by the remoteshell script - it was setting MaxStartups 1024, which broke things somewhere between OpenSSH 8.2 and 8.9 (Focal and Jammy). I worked around that with an additional script run later in the build that removed the setting, but it was a right pain figuring out what was going on there, and that wasn't the only issue I ran into, just the one I can remember right now. All that means a custom template and a significant amount of tweaks and workarounds are likely to be your best option, at least until someone can hammer the wrinkles out of the current default templates and support scripts. . . . and I guess I should really submit a bug report and patch for that MaxStartups issue, at some point, in my copious spare time . . . .

The final thing I've hit is issues with the genesis image on newer hardware - I ended up putting quite a lot of work into developing a new version based on Fedora 36 (rather than, I think, 28 or thereabouts, and a 3.something kernel), which got to the point where it did what I needed to get the new hardware up and then I didn't have time to clean up for submission upstream. The current state of that is at https://github.com/sjjf/xcat-core/tree/fedora36_genesis , but it'll need work to make it usable by anyone else, and a lot of work to make it mergeable (if it's even acceptable to the core xCAT devs).

I've created a gist (https://gist.github.com/sjjf/5dfa794b7eb8d8712620c86d2c881518) with a lightly edited version of a template that I /know/ works with 22.04 and EFI hardware - feel free to use that as a base if you think it'd be useful. As I said, It Works For Me(tm).

sjjf avatar Jun 03 '23 04:06 sjjf

Are there any updates on using xCat to deploy Ubuntu 20.04(+?) client systems? (Our deployment server is CentOS.)

I've been using it in my environment to deploy 20.04 and 22.04 nodes for a while now - it works reliably for me, with a custom template and some tweaking of the supplied pre- and post-scripts to work around issues triggered by updated/changed default software in the newer Ubuntu releases.

@sjjf stated in one comment that the "subiquity based installer used by newer Ubuntu is supported". Is there any kind of documentation / weblog posts on this?

I posted a gist in my other recent comment which has a template that works for me with 22.04 - I pretty much just point my customised osimage at that template, rinstall away, and it works. It's not very different from setting up any other custom osimage based build, at least as far as I can tell - I may be doing things in a way that doesn't match normal xCAT best practise, though, so maybe I'm just missing a pain point that most people would run into . . .

I'd love to volunteer to update documentation, but I really don't have time - I generally end up working on this stuff when it's not doing what I need, and I rarely have time to put in all the work needed to wrap everything up in a pretty bow . . . which is obviously not a good way to think about documenting things properly, but once I've got something working well enough for me going a long way past that to make something complete, neat and tidy, and properly documented is generally more than I have time for.

sjjf avatar Jun 03 '23 05:06 sjjf

@sjjf thanks for your comments above. My company, OCF, are a HPC systems integrator continuously deploying to different sites with different hardware and software configurations. Hardware for testing this on isn't going to be a problem for me and time shouldn't be once I get this on our road map for later this year. Repeatability on differing setups is what I'll need from this so I'm very pleased you have shared what you have so far and I'm happy to take that further and refine it into a mergable candidate.

I posted a gist in my other recent comment which has a template that works for me with 22.04 - I pretty much just point my customised osimage at that template, rinstall away, and it works. It's not very different from setting up any other custom osimage based build, at least as far as I can tell - I may be doing things in a way that doesn't match normal xCAT best practise, though, so maybe I'm just missing a pain point that most people would run into . . .

Deploying Ubuntu 2X.04 client nodes isn't supported so best practice is being broken just be doing this :)

I'd love to volunteer to update documentation, but I really don't have time - I generally end up working on this stuff when it's not doing what I need, and I rarely have time to put in all the work needed to wrap everything up in a pretty bow . . . which is obviously not a good way to think about documenting things properly, but once I've got something working well enough for me going a long way past that to make something complete, neat and tidy, and properly documented is generally more than I have time for.

I will be able to cover this off, if I get it into a working position, as part of my release process and upstream contribution.

Thanks, Matt.

ocfmatt avatar Jun 05 '23 07:06 ocfmatt

I've got a cluster that's running 18.04 for the head node and for the diskless clients. Ideally, I stay Ubuntu and get to 22.04 for both. @sjjf has clients running 22.04 -- is that diskless? @ocfmatt -- Did you make any progress there?

Has anyone gotten the head / management node running on anything newer than 18.04? Or, should I be biting the bullet and moving this all to something like Rocky.

celstark avatar Dec 11 '23 01:12 celstark

@celstark, unfortunately I haven't had the time to get around to this on my road map yet and had to set it to Jan-Mar 24.

The best support for xCAT servers and clients is on RHEL or other dervied operating systems and it's unlikely the current Ubuntu release will make it into xCAT2. I'll certainly be giving it a proper try in the next quarter and if making it work isn't too hacky or unrepeatable then Ubuntu provisioning could make it's way to my production installations.

If you want to use xCAT and don't have a hard dependency on Ubuntu going down the route of RHEL or Rocky will be your simpliest option.

Regards, Matt.

ocfmatt avatar Dec 11 '23 09:12 ocfmatt

Thanks @ocfmatt -- the challenge is the inertia / sunk cost. The cluster is already running nicely on xCat and Ubuntu 18 LTS, but Ubuntu 18 LTS is long in the tooth. Upgrading it to 20 or 22 is easier than a wholesale shift to something RHEL based. Well, it would be were it not for the xCat issues. But, sometimes life doesn't give you the easy path ;)

celstark avatar Dec 11 '23 18:12 celstark

@celstark

  • Is it even possible to migrate cleanly from Ubuntu 18 to Ubuntu 22, or even 20? My daily driver has been Ubuntu LTS since 14.04, and I've never had a clean dist-upgrade during that time frame. With server environments, the stability concerns are even worse than desktop/laptop environment, so I expect the disconnect will be worse.
  • A dist-upgrade is no easier than a shift to RHEL/Centos(RIP)/Rocky 8, but that's just my opinion, not a hard fact (my workstation have been Ubuntu LTS variants, but my servers have always been Centos till the Stream debacle).
  • Finally if a shift is required, maybe Debian can be considered a viable alternative?

samveen avatar Dec 12 '23 12:12 samveen