conda-forge.github.io icon indicating copy to clipboard operation
conda-forge.github.io copied to clipboard

CentOS 8

Open jakirkham opened this issue 4 years ago • 27 comments
trafficstars

A few use cases for CentOS 8 have come up recently. Namely CUDA support for ARM and PPC64LE. Potentially more use cases will show up in the future. Am opening this issue so that we can discuss how best to handle this need

cc @jaimergp @kkraus14 @isuruf @beckermr @conda-forge/core

jakirkham avatar May 03 '21 18:05 jakirkham

CentOS 8 EOL is December 31 of this year. I don't think implementing support for it is well-motivated. Vendors will have to move on from it anyways.

beckermr avatar May 03 '21 18:05 beckermr

Right so maybe we need to use an alternative. Rocky Linux has come up before.

Also here's a longer post on alternatives: https://haydenjames.io/what-centos-alternative-distro-should-you-choose/

jakirkham avatar May 03 '21 18:05 jakirkham

However it's worth noting we are using upstream Docker images for these architectures & CUDA versions. So the OS is already fixed

Edit: Raised upstream issue ( https://gitlab.com/nvidia/container-images/cuda/-/issues/123 ) about this

jakirkham avatar May 03 '21 18:05 jakirkham

@chenghlee What is anaconda moving to? Mirroring that is likely a good idea.

beckermr avatar May 03 '21 18:05 beckermr

A suggestion brought up on the NVIDIA CUDA image repo upstream would be to look at RedHat's Universal Base images, which are also being supplied. Have not looked at these closely yet, but that might be something else to consider

jakirkham avatar May 05 '21 01:05 jakirkham

Anaconda's current plan is to stay on CentOS/RHEL 7 (glibc 2.17) as much as possible for the packages on repo.anaconda.com (defaults); if we need a newer glibc for some reason, we'll likely look at Debian 9 or 10.

chenghlee avatar May 05 '21 14:05 chenghlee

if we need a newer glibc for some reason, we'll likely look at Debian 9 or 10.

Debian 9 is not supported by CUDA: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

The only OS supported by CUDA across x86_64, PPC64, and ARM SBSA is RHEL 8 (since CentOS 8 isn't really a thing anymore).

kkraus14 avatar May 05 '21 16:05 kkraus14

if we need a newer glibc for some reason, we'll likely look at Debian 9 or 10.

Debian 9 is also EOL in about a year and 10 uses the same base glibc version as CentOS 8, so I'd advice to go for 10 when really needed. EDIT: Going for Debian 10 would be problematic for Ubuntu 18.04 though (and Debian 9 also for Ubuntu 16.04 but, as I now learned, that Ubuntu version is EOL since a couple of days by now.)

mbargull avatar May 05 '21 17:05 mbargull

Hey, I would like to raise again this issue. Do we know which OS should we use? Should we use ubi8? Maybe we can start with centos8 now and take some time before the end of the year to decide?

jjacobelli avatar Jun 04 '21 10:06 jjacobelli

When we had discussed supporting other architectures that needed CentOS 8 for deployment previously, we came to the conclusion that we may be able to actually build things on CentOS 7. We just wouldn't be able to load any libraries (via a Python import or otherwise). However as we already don't do these things with GPU builds, this may not actually present much of a problem. This was the thinking anyway

There are a few reasons we were thinking of this (admittedly somewhat hacky solution).

First is CentOS 8's EOL is really quite soon (December 2021; yes this year). Also the new CentOS release system, CentOS Stream, really doesn't work well for our use case (of building with a really old GLIBC that is supported by the vast majority of systems out there). So we may find ourselves abandoning CentOS for some other solution in the future. What that future solution will be is somewhat unclear, but there are a few options being considered (Debian, Rocky Linux, UBI, etc.)

Second adding a new OS (like CentOS 8) involves doing a fair bit of work. Namely building docker images, rebuilding the compiler toolchain, building CDTs, etc.. So for something that won't be around for more than a few months, it really isn't worth undertaking that work at least not in conda-forge as a whole.

There are probably more reasons that I'm forgetting, but those are already fairly significant considerations.

Anyways so to tie a bow on this we might want to try just using CentOS 7 in the case where we need it and see how that goes. That said, it still isn't quite that simple, but maybe we can discuss the other points offline

jakirkham avatar Jun 04 '21 17:06 jakirkham

I am under the impression that we can just install a CUDA runfile distribution in a vanilla cos7 image. I think this would work for x86-64, but I am less certain about aarch64 or ppc64le.

leofang avatar Jun 04 '21 18:06 leofang

The current Docker images are covered under the NVIDIA licensing agreement. Am not sure that would be true of some custom made image. This is something we would need to figure out

jakirkham avatar Jun 04 '21 18:06 jakirkham

When we had discussed supporting other architectures that needed CentOS 8 for deployment previously, we came to the conclusion that we may be able to actually build things on CentOS 7. We just wouldn't be able to load any libraries (via a Python import or otherwise). However as we already don't do these things with GPU builds, this may not actually present much of a problem. This was the thinking anyway

Building the packages on a CentOS 7 should work, but the issue I'm facing right now is that some packages (i.e. cuda-toolkit) are trying to run some tests at the end of the build that may require to load the libraries and so it's failing if we don't have the right version of the GLIBC. Should we consider not running these tests on other arch than x86_64? Example of failing CI: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=332546&view=logs&j=81eb4d60-76fc-5ac4-a959-9ebb9871bfee&t=e733809a-cb57-567e-b6dc-c69c35a56404

===== testing package: cudatoolkit-11.2.2-h24a0247_8 =====
running run_test.py
Finding cublas from Conda environment
	located at $PREFIX/lib/libcublas.so.11.4.1.1043
	trying to open library...	ok
Finding cusparse from Conda environment
	located at $PREFIX/lib/libcusparse.so.11.4.1.1152
	trying to open library...	ok
Finding cufft from Conda environment
	located at $PREFIX/lib/libcufft.so.10.4.1.152
	trying to open library...	ok
Finding curand from Conda environment
	located at $PREFIX/lib/libcurand.so.10.2.3.152
	trying to open library...	ERROR: failed to open curand:
/lib64/libm.so.6: version `GLIBC_2.27' not found (required by $PREFIX/lib/libcurand.so.10.2.3.152)
Finding nvvm from Conda environment
	located at $PREFIX/lib/libnvvm.so.4.0.0
	trying to open library...	ok
Finding cudart from Conda environment
	located at $PREFIX/lib/libcudart.so.11.2.152
	trying to open library...	ok
Finding cudadevrt from Conda environment
	located at $PREFIX/lib/libcudadevrt.a
Finding libdevice from Conda environment
	searching for compute_20...	ok
	searching for compute_30...	ok
	searching for compute_35...	ok
	searching for compute_50...	ok
Tests failed for cudatoolkit-11.2.2-h24a0247_8.tar.bz2 - moving package to /home/conda/feedstock_root/build_artifacts/broken

jjacobelli avatar Jun 08 '21 13:06 jjacobelli

Yeah I think not running the tests or running parts of the tests that don't require library loading would be preferable.

One other thing we might consider is checking for GLIBC version or try loading the libraries (and not error if that fails). This can be useful as we can still opt to run these tests on systems with a new enough GLIBC

For example CuPy has checks similar to this where it won't run some tests if a GPU is missing. This allows us to still run the tests on systems that have a GPU. We can also use the conda build --test command with the package produced to run the tests on that package on a system with a GPU to make sure it works ok. Mentioning all of this as we can use similar strategies with cudatoolkit packages on ARM

jakirkham avatar Jun 08 '21 18:06 jakirkham

The massively reduced support for CentOS 8 is really a pity for this next step. Assuming AlmaLinux and/or RockyLinux can uphold their promises of 1:1 compatibility with the pre-stream CentOS, I still think it would be interesting to try them out?

So far, both only support x86 & aarch64, haven't found anything about ppc yet. RockyLinux just released 8.4rc, AlmaLinux was a bit faster there (though aarch64 support doesn't show up on the main page yet).

h-vetinari avatar Jun 09 '21 12:06 h-vetinari

Rocky Linux & Alma Linux 8.4 have been released a few days ago (both for x86 & aarch64). No update about PPC support, but I've asked on their respective discourse servers.

Regarding compatibility, here are relevant quote from their websites (emphasis mine):

Rocky Linux is a community enterprise operating system designed to be 100% bug-for-bug compatible with America's top enterprise Linux distribution now that its downstream partner has shifted direction. It is under intensive development by the community. Rocky Linux is led by Gregory Kurtzer, founder of the CentOS project.

Alma Linux:

Governed and driven by the community, focused on long-term stability and providing a robust production-grade platform that is 1:1 binary compatible with pre-Stream CentOS and RHEL®.

Since both promise 1:1 compat so prominently, could this not be an option?

h-vetinari avatar Jul 03 '21 12:07 h-vetinari

No update about PPC support, but I've asked on their respective discourse servers.

Update: Rocky Linux is planning a PPC release soon.

h-vetinari avatar Jul 04 '21 22:07 h-vetinari

Reviving the thread. PEP-600 has opened a way to build binaries targeting newer versions of GLIBC. Auditwheel 5.3.0 supports GLIBC 2.35 and older.

It's high time conda ecosystem evolved beyond GBLIC 2.17 to accommodate Python packages built with newer toolchain whose runtime libraries require GLIBC > 2.17.

Settling the question of what's the next version is hard, but must it be the single version everyone must agree to? Is it possible for conda to have multiple sysroot versions for the same platform?

oleksandr-pavlyk avatar Feb 13 '23 22:02 oleksandr-pavlyk

We can have multiple sysroots at once, so that helps a lot. Right now we support 2.12 and 2.17. I suspect the next one to add is one of 2.27 or 2.28. We have a related question of adopting a new distribution from which to build CDTs and supply our default linux environment.

beckermr avatar Feb 13 '23 22:02 beckermr

It's high time conda ecosystem evolved beyond GBLIC 2.17 to accommodate Python packages built with newer toolchain whose runtime libraries require GLIBC > 2.17.

Settling the question of what's the next version is hard, but must it be the single version everyone must agree to? Is it possible for conda to have multiple sysroot versions for the same platform?

A lot of similar discussions happened for manylinux_2_28 (successor to manylinux2014 == manylinux_2_17), which have some relevant information[^1], particularly the realisation that AlmaLinux / RockyLinux / rhubi are all effectively a full-fledged replacement to CentOS[^2].

[^1]: even though manylinux has harder constraints than conda-forge (no own compiler infra), and therefore settled one of the (following the demise of CentOS) several RHEL-alikes, which benefit from the devtoolset backports of newer GCCs to old OSes.

[^2]: This was after a failed attempt at getting the Debian-based manylinux_2_24 off the ground (to reduce the glibc version jump from CentOS 7).

Even though we technically could have several, I think it would be easiest to just choose one of those RHEL-ABI-compatible distros, which would also continue in the spirit of why CentOS was a good choice previously.

h-vetinari avatar Feb 14 '23 00:02 h-vetinari

We could look at AlmaLinux 8, which has aarch64 & ppc64le support

jakirkham avatar Feb 14 '23 00:02 jakirkham

Yeah alma 8 is a good choice.

beckermr avatar Feb 14 '23 00:02 beckermr

I just saw that as of LLVM 17, libcxx only supports glibc >=2.24. LLVM 17 will be released in a couple months.

The good thing is that (compared to e.g. https://github.com/conda-forge/conda-forge.github.io/issues/1844), libcxx-on-linux isn't part of our default compiler stack. GCC & libstd++ claim to only require the 20+ year old glibc 2.3, though I'm doubtful if anyone still tests gcc with something that old.

OTOH, libstdc++ docs also say:

4.7. Recent GNU/Linux glibc required?

[...] The guideline is simple: the more recent the C++ library, the more recent the C library. (This is also documented in the main GCC installation instructions.)

In fact, since Microsoft finally started supporting C11/C17 a few years ago (thus unblocking cross-platform projects from having to stay on C89), several projects are now moving to require C11 (including CPython), which needs a newer glibc, c.f. e.g. https://github.com/conda-forge/linux-sysroot-feedstock/issues/44.

While I don't want to rehash the discussion in https://github.com/conda-forge/conda-forge.github.io/issues/1436, the EOL of CentOS 7 is now about a year away, from which point on the bitrot of support for old glibc (resp. the move towards requiring C11+) will likely accelerate even more. We definitely need newer sysroots soon.

h-vetinari avatar Apr 18 '23 01:04 h-vetinari

Last I checked alma 8 was a good choice for us.

There is a list of todo items for whatever we choose

  • [ ] check with anaconda on what they are doing
  • [ ] decide on the cdt name (eg alma8)
  • [ ] put in changes into the cdt scripts to build alma8 ones
  • [ ] build the sysroots
  • [ ] assemble docker images

I am sure missing stuff but that's a start.

beckermr avatar Apr 18 '23 01:04 beckermr

Last I checked alma 8 was a good choice for us.

This also aligns with manylinux 2_28 which is AlmaLinux 8 based. This gives us glibc 2.28 which does have a few possible incompatibilities:

  • Ubuntu 18.04 isn't quite EOL yet and uses glibc 2.27. Ubuntu 20.04 upgraded to glibc 2.31 and Ubuntu 22.04 upgraded to glibc 2.35.
  • SUSE 12 isn't EOL and uses glibc 2.22. SUSE 15 upgraded to glibc 2.31.

kkraus14 avatar Apr 18 '23 02:04 kkraus14

This also aligns with manylinux 2_28 which is AlmaLinux 8 based

Yup, I linked the discussion to the pypa/manylinux issue where this was decided a bit further up.

This gives us glibc 2.28 which does have a few possible incompatibilities

In the context of the discussion from #1436, this is not about raising the lower bound to 2.28, but about enabling to build packages that (for whatever reason) require glibc >2.17. IOW, the upstream maintainers have already lifted their floor past our current ceiling and so we need to have a way to build such packages. But dropping CentOS 6, much less 7, is a whole 'nother ballpark[^1].

[^1]: I'd be in favour of the former, but there was a lot of discussion in #1436, and for now the opt-in to newer glibc's seems to be working well enough that we haven't been forced to abandon CentOS 6 as the default yet.

h-vetinari avatar Apr 18 '23 02:04 h-vetinari

Thanks for adding this to the agenda, Axel! 🙏

jakirkham avatar Apr 18 '23 08:04 jakirkham