conda-forge.github.io icon indicating copy to clipboard operation
conda-forge.github.io copied to clipboard

when to drop (or add) ppc support

Open minrk opened this issue 8 months ago • 25 comments

Your question:

The linux-ppc64le target platform seems to regularly require the most effort of all platforms to keep working, while also being the least used by at least 1-2 orders of magnitude. I've been frustrated by this, and have started to remove support from my feedstocks if/when ppc failures are blocking otherwise working builds on platforms where I know people actually use these packages. Since I'm not actually sure the number of users of my ppc packages exceeds zero, it starts to feel pretty bad to spend all this volunteer effort on users who may not even exist.

Is there any guidance for maintainers on how to decide if/when to support a platform for a given package, or drop it if it used to work but is now causing problems? And maybe how to best go about (temporarily) stopping builds on a platform? I know removing the ppc line from conda-forge.yml works, as does skip: target_platform == 'linux-ppc64le', but I'm not sure what is best for bots and such, or if there's any information that should go elsewhere when dropping support, whether it's temporary, indefinitely, or permanently.

minrk avatar Apr 23 '25 10:04 minrk

The bot adds ppc64le once and then moves on.

IMHO, maintainers should feel free to drop support if they do not have the energy for it. People who need it can come and add it back if they like.

beckermr avatar Apr 23 '25 11:04 beckermr

I understand the sentiment.

Whenever I looked at download numbers (e.g. 1, 2), PPC is at least a factor 10 behind the second-least-used platform, and less than 1-in-350 users overall. For example, using current libzlib downloads as a proxy:

artefact uptime downloads % 1 PPC user
to <target> users
win-arm64/libzlib-1.3.1-hfbbf558_2.conda 6 months, 20 days 174 0.00%  -
linux-ppc64le/libzlib-1.3.1-h190368a_2.conda 6 months, 20 days 44'392 0.27% 1
linux-aarch64/libzlib-1.3.1-h86ecc28_2.conda 6 months, 20 days 448'351 2.73% 10
osx-64/libzlib-1.3.1-hd23fc13_2.conda 6 months, 20 days 532'977 3.25% 12
osx-arm64/libzlib-1.3.1-h8359307_2.conda 6 months, 20 days 690'866 4.21% 16
win-64/libzlib-1.3.1-h2466b09_2.conda 6 months, 20 days 1'551'925 9.46% 35
linux-64/libzlib-1.3.1-hb9d3cd8_2.conda 6 months, 20 days 13'132'481 80.07% 296
overall - 16'401'166 100.00% 369

The numbers are the same for openssl, qualitatively

artefact uptime downloads % 1 PPC userto <target> users
linux-ppc64le/openssl-3.5.0-hede31bd_0.conda 15 days 3'502 0.21% 1
linux-aarch64/openssl-3.5.0-hd08dc88_0.conda 15 days 47'608 2.81% 14
osx-arm64/openssl-3.5.0-h81ee809_0.conda 15 days 81'340 4.79% 23
osx-64/openssl-3.5.0-hc426f3f_0.conda 15 days 104'865 6.18% 30
win-64/openssl-3.5.0-ha4e3fda_0.conda 15 days 172'111 10.14% 49
linux-64/openssl-3.5.0-h7b32b05_0.conda 15 days 1'287'324 75.87% 368
overall - 1'696'750 100% 485

OTOH, several 1000s of users isn't nothing either 🤷

Historically, we had a lot of problems with emulation in numpy/scipy/cvxpy, which at some point lead to dropped builds and/or skipped testing. In recent times, I've had less issues with PPC though. If a recipe builds on linux-aarch64, it usually also builds on linux-ppc64le, so I've kept support where it wasn't getting in the way.

The fact that CUDA dropped support for PPC is relevant here as well, and we're missing certain key packages like pytorch too.

IMHO, maintainers should feel free to drop support if they do not have the energy for it. People who need it can come and add it back if they like.

A couple of years ago when support was first added, there were (non-core) contributors who helped get the arches going on several feedstocks I maintained, but then disappeared. I felt responsible to keep thing running and wasted a lot of time on ppc support at the time. So it's easy as a maintainer to get "trapped" in that sense.

I don't think it's realistic that conda-forge ever drops ppc support, but I'd be OK to empower maintainers to drop ppc support and reject it even if someone comes with a PR. Unless that person signs up for future maintenance, it will just mean ppc gets dropped at the next issue that comes up.

h-vetinari avatar Apr 23 '25 22:04 h-vetinari

OTOH, several 1000s of users isn't nothing either 🤷

What fraction of that is Conda Forge CI? According to the migration status, those both have 2.2K children, so it seems possible that building other ppc64le packages could drive many downloads.

mfansler avatar Apr 24 '25 08:04 mfansler

What fraction of that is Conda Forge CI?

It should be essentially 0. I don't know the specifics, but our downloads from our CI aren't (supposed to be) counted. The first ~10 downloads or so are due to various mirrors(?) or so (at least, I've never seen a main package with less than that), but after that it should only be cf-external users.

h-vetinari avatar Apr 24 '25 09:04 h-vetinari

It should be essentially 0

Perhaps it should be, but I'm not sure it is. I just triggered a rebuild of slepc and all the petsc downloads incremented by one. It seems unlikely to me that something else downloaded all the ppc builds the exact same number of times the CI builds did in the same 20 minutes I was watching, when they only had 7 downloads before. It does seem like every CI download is counted.

minrk avatar Apr 24 '25 20:04 minrk

Yes my guess is that CI downloads are counted.

beckermr avatar Apr 24 '25 23:04 beckermr

i find that PPC64le struggles alot with graphics stack. My understanding is that PPC64le is exclusively used in data centers, so I've been liberal with dropping it when it causes me problems for those packages...

hmaarrfk avatar Apr 27 '25 12:04 hmaarrfk

Perhaps it should be, but I'm not sure it is. I just triggered a rebuild of slepc and all the petsc downloads incremented by one.

I don't have the exact mechanics, but roughly: the first download is when the package makes it through the CDN, and as I mentioned

The first ~10 downloads or so are due to various mirrors(?) or so

By "10", I literally mean ten. IME, anything <=10-12 downloads is completely unused (again, I don't know which factors exactly conspire for that to be the case).

Yes my guess is that CI downloads are counted.

Hm. I had thought that this was taken into account (and had convinced myself on this based on some anecdotal evidence). We'd be generating 100'000s of downloads daily, so that would make a very substantial chunk of our download numbers.

h-vetinari avatar Apr 29 '25 22:04 h-vetinari

Not at all scientific, and I'm not sure it's informative, but in the time since you posted numbers for openssl 3.5.0 (6 days), the download numbers have increased by:

platform increase
linux-64 631,289
linux-aarch64 24,715
linux-ppc64le 1,389
osx-64 33,760
osx-arm64 36,817
win-64 70,574

with this number of non-cancelled builds per target platform on azure in a similar time frame:

name count
linux-64 6,652
linux-aarch64 2,540
linux-ppc64le 1,793
linux-s390x 5
osx-64 4,008
osx-arm64 2,805
win-64 2,927
win-arm64 79

As I understand it, during ci setup openssl gets downloaded for the build platform. Assuming all linux builds are cross-compiled (not the case, but probably close), that would only account for ~10k of 630k linux-64 downloads, and 7k of 33k osx-64 downloads. I don't know how common openssl is as a host dependency. Notably, ppc is the only platform where the build count exceeds the total download count. No other platform comes anywhere close.

This obviously doesn't take into account lots of information, so it might be useless:

  • ci builds not on azure
  • no idea how often openssl shows up in build, host, or run dependencies
  • multi-output builds, test env installation
  • caching?

But I think another point in favor of CI installs being counted is that osx-64 is downloaded almost as much as osx-arm64, when ~all macs sold in the last 6 years have been arm, and the arm mac miniforge installer is about 10 times as popular as osx-64, while all mac CI builds for conda-forge still all run on osx-64. I suspect conda-forge CI accounts for a pretty large fraction of counted osx-64 downloads.

minrk avatar Apr 30 '25 10:04 minrk

about 10 times as popular as osx-64

Oops, I missed that there were 2 urls per platform. ARM mac downloads are only 50% higher than Intel.

minrk avatar Apr 30 '25 10:04 minrk

ARM mac downloads are only 50% higher than Intel.

This is consistent with numbers I'm seeing across the board in recent months[^1].

[^1]: the fact that osx-64 was ahead of osx-arm64 for openssl was an outlier, not the rule

But I think another point in favor of CI installs being counted is that osx-64 is downloaded almost as much as osx-arm64, when ~all macs sold in the last 6 years have been arm

I've been puzzled by the longevity of the osx-64 numbers, but OTOH, the latest macOS 15 still supports hardware from ~2018 onwards, and these old(er) devices are still around in big numbers[^2].

[^2]: I won't speculate to what degree macro trends like e.g. a worsening economy might influence more people to keep using their old hardware.

Notably, ppc is the only platform where the build count exceeds the total download count.

Regardless of the exact mechanics (i.e. whether it's downloads from the ci-setup, or build/host/run), I think that's a pretty solid argument that CI downloads are not being counted. :)

h-vetinari avatar Apr 30 '25 11:04 h-vetinari

I know one user of ppc64le, @jeongseok-meta. Perhaps you want to share your "user story" so we can better understand one example of the need for ppc64le packages in the real world.

hmaarrfk avatar Apr 30 '25 12:04 hmaarrfk

i also have a friend that used to work at IBM that used ppc64le, but honestly, jayfurmanek was the ppc64le champion at the time and now i understand he is no longer working there, and not sure how much those ppc64le supercomputers are used there anymore.

hmaarrfk avatar Apr 30 '25 12:04 hmaarrfk

FWIW CUDA dropped ppc64le support entirely by CUDA 12.5. In my (personal) opinion the maintenance overhead of ppc packages on conda-forge is too high and we should drop it too sooner than later.

leofang avatar Apr 30 '25 13:04 leofang

, I think that's a pretty solid argument that CI downloads are not being counted. :)

I'm not sure I follow there. Due to cross compiling, only builds which have openssl in host or run dependencies (or the rare emulated builds) would increment the download count for arm/ppc. If most but not all packages depend on openssl, then slightly lower but similar order is exactly what I'd expect to see if all or at least most CI downloads were counted. So to me, these numbers indicate that CI downloads are counted and also account for a likely very large fraction of ppc installs.

Since I seem to be able to reliably at-will increment download counts by triggering CI and watching anaconda.org numbers go up by the exact number of builds, I think we can confidently say that CI downloads are counted, at least up to some point.

minrk avatar Apr 30 '25 14:04 minrk

Yes for sure CI downloads are counted. Anaconda would have to track and form reject lists for the IP addresses of all of the microsoft-hosted azure, travis, and other CI provier workers to exclude them which seems a priori impossible or would require so much effort as to be not worth the cost.

beckermr avatar Apr 30 '25 15:04 beckermr

I know one user of ppc64le, @jeongseok-meta. Perhaps you want to share your "user story" so we can better understand one example of the need for ppc64le packages in the real world.

Thanks for pinging. I don't currently have a use case for ppc64le. I've just attempted to support it by ensuring its availability in the conda-forge ecosystem, and as such, I also have the same question regarding this issue. ;)

jeongseok-meta avatar May 01 '25 01:05 jeongseok-meta

ppc64le is used in supercomputers like Summit, Lassen by maintainers like @matthiasdiener. If a package is having specific problems with ppc64le, that's okay to drop.

isuruf avatar May 01 '25 12:05 isuruf

would it be acceptable to "split" the migrator for PPC64le away from the aarch migrator?

hmaarrfk avatar May 01 '25 12:05 hmaarrfk

Sure, that requires some coding on the bot to do that.

isuruf avatar May 01 '25 13:05 isuruf

I really tried to find where the bot reads the aarch migration file but couldn’t find it. Can you point me to it?

hmaarrfk avatar May 03 '25 17:05 hmaarrfk

It's at https://github.com/regro/cf-scripts/blob/main/conda_forge_tick/migrators/arch.py#L134

isuruf avatar May 05 '25 12:05 isuruf

@rgommers pointed me to https://github.com/IBM/actionspz, a service by IBM to request GHA runners for ppc and others, in case that makes any difference to the discussion here.

jaimergp avatar Jun 06 '25 08:06 jaimergp

Thanks @jaimergp that is useful for those wishing to add support.

I think the steps are the following:

  1. Feedstocks that feel the burden from ppc64le should feel free to drop it in a dedicated PR for tracibility.
  2. A champion of PPC64le should ensure that it is a distinct migrator from the aarch64 migrator following isuruf's tip.
  3. The requesters of the PPC64le variants to add themselves to the feedstocks they are requesting as maintainers (with a comment explaining that they are requesting ppc64le support).
  4. Look into native runners for ppc64le to help improve maintainer quality of life.

hmaarrfk avatar Jun 06 '25 20:06 hmaarrfk

ppc64le is used in supercomputers like Summit, Lassen by maintainers like @matthiasdiener. If a package is having specific problems with ppc64le, that's okay to drop.

I used to use ppc64le packages years ago on OLCF Summit. Summit has now been retired. It has been replaced by the amd64 machine Frontier. I'm not aware of any active ppc64le systems in US national HPC centers. I will keep ppc64le builds in the feedstocks I maintain for now, but per this discussion will plan to remove them whenever they require extra work to maintain.

joaander avatar Oct 08 '25 16:10 joaander

Pinging @conda-forge/help-ppc64le and also making folks aware of this related thread in Zulip: #general > Cross-builds on linux-ppc64le are failing

jaimergp avatar Jan 07 '26 14:01 jaimergp