matplotlib icon indicating copy to clipboard operation
matplotlib copied to clipboard

Build for `musllinux` on `ARM`

Open korverdev opened this issue 1 year ago • 13 comments

PR summary

Closes https://github.com/matplotlib/matplotlib/issues/28543. This does double build times in CI, so this may not be the ideal solution to this. If build times are a concern, I can look into splitting this work out across more runners.

PR checklist

korverdev avatar Jul 19 '24 00:07 korverdev

Seems like codecov is incorrectly failing on this.

korverdev avatar Jul 19 '24 01:07 korverdev

Do not worry about the codecov failures. They are a bit random...

oscargus avatar Jul 19 '24 09:07 oscargus

Closing and reopening to see if the builds can actually be triggered...

oscargus avatar Jul 19 '24 10:07 oscargus

97 minutes is probably a bit too much. Not sure I fully followed the discussion in #28543 but at least running it in a separate worker would be a good thing.

oscargus avatar Jul 19 '24 12:07 oscargus

@oscargus I can absolutely do that. To make sure we're in alignment with that's expected here, I'd be inclined to take this a step further and separating builds by Python version. That'd get us to the builds taking under 30 minutes, which would be a significant improvement to how things are currently in terms of runtime.

One potential concern with this approach. This doesn't change the number of server minutes used and can actually increase costs since billing is rounded up to the nearest minute and you'd be paying for the standup costs more. I don't believe y'all have to pay for your CI minutes as an open source repo, so this should be a non-issue. Does this as a resolution seem acceptable to you or would you prefer I keep to your initial proposal of splitting out ARM musllinux into it's own runner?

korverdev avatar Jul 19 '24 13:07 korverdev

I think it is better that someone "more senior" replies to this. The only thing I am pretty sure about is that a 97 minutes CI job will not be appreciated...

(But what you say makes sense to me and the idea of parallelizing over versions struck me as well. And, yes, I do think that the GitHub minutes are free, although with a limited number of parallel jobs, like 20.)

oscargus avatar Jul 19 '24 13:07 oscargus

Makes sense. One other thing we may want to consider with the approach here is that ARM runners are coming down the pipeline eventually. They're currently in public beta and not available for free to open-source projects, but I imagine GitHub has a lot of incentive to get projects like this off using QEMU and burning quadruple the server minutes they'd otherwise be using on a native ARM platform.

Totally selfishly, I don't want to wait on a beta with an unknown timeline to make it over the line, but we should probably make sure that whatever we decide on here serves that eventuality as well and is in the best long-term interests of the project.

korverdev avatar Jul 20 '24 01:07 korverdev

@oscargus Just following up on this. Could you nudge someone who's able to steer me in the right direction on how to address your concerns?

korverdev avatar Aug 07 '24 02:08 korverdev

Sorry, was hoping that someone would pop in. I've added it to the next dev-call (which is tomorrow evening, feel free to join https://hackmd.io/l9vkn_T4RSmk147H_ZPPBA ). Will try to attend, but hope that someone else will provide feedback if I cannot make it.

oscargus avatar Aug 07 '24 14:08 oscargus

Does this ci just get run once a day or during the release cycle? If do not sure an hour would be a problem. @QuLogic would be a good person to comment if he has time.

jklymak avatar Aug 07 '24 15:08 jklymak

This runs on every commit (i.e. merge) to main.

Most of the time it is not directly looked at, but the times where it is, that can get quite frustrating. (e.g. cases where we are debugging wheels/build process steps, such as the recent problems with windows wheels, while the different platforms upload separately, that limits the scope of the frustration significantly, it is still potentially there)

The other place where the increased time is at least potentially problematic is in the actual release process. This is mitigated to a degree by increased automation of some release steps (uploading wheels is largely automated, though I think we now have a "press button to confirm first" that allows us to do specific tests on the wheels before uploading, @QuLogic had the first full enactment of that just yesterday, though)

One point of discussion has been on the future sustainability of supporting this build. While I suspect problems are likely to be rare (we already build arm and musl wheels separately, just not together), the dev team is mildly hesitant to add a platform that none of us use. @korverdev are you (or anyone else for that matter) willing to champion any platform specific problems that we may encounter? Having someone specific to ping and be responsive to any problems that arise would ease that hesitancy. That is I believe the bigger sticking point than the time.

As for time, there are a couple of things we could do (which even be combined):

  • Further parallelization, as you noted above.
    • I may lean towards not going full out to one wheel per job, but not quite sure how far to split it up, maybe even just "arm glibc" and "arm musl" get split up.
  • Using native ARM github runners instead of QEMU on top of standard x64 runners
    • These are available (though I think still considered beta), though may require setting up payments as I do not see anything that allows their (free) use on the standard OSS github plan.
    • I am pretty sure they would be faster, not sure by how much. My estimate will be twice as fast, which I think would put us at around $0.25 per build.
  • Limiting these builds to schedule instead of merge to main
    • would retain the ability to specifically build on PRs or manually from main
    • Would have to consider frequency
    • could do more complicated things like "tier 1 support runs on all pushes to main", "tier 2 support runs on schedule", but that may run into problems with nightly wheels sometimes seeing tier 2 and sometimes seeing only tier 1.

ksunden avatar Aug 07 '24 16:08 ksunden

@ksunden I unfortunately cannot attend that call this week. I am willing to try my best to champion any platform-specific problems and intend to immediately start using this build for our company if it's made available, so would be one of the first impacted by any issues. Fair warning though, I have never taken such an active role in an open-source project before, so it'll be a learning experience for me. I am pretty active on GitHub though, so I'll be quick to respond to pings.

As for options to move forward with this, I can parallelize this as you've described. If ARM runners are made available to me, I think that's the best long-term solution and I can take responsibility for implementing that. Based off my experience, I think the runtime improvement will be much more significant than you're estimating. I don't think it's my place to limit when these builds run without prior sign-off, but that also makes sense to me if that's what y'all settle on. We take a similar approach at my company and just provide an option to run the GitHub Action manually with one button click if the need arises. It was rather easy to implement and I'm absolutely willing to help with that or hand that off to someone with more experience working with these actions.

korverdev avatar Aug 08 '24 15:08 korverdev

Hello! Can someone make this home assistant on wheels, I want to use it only python 3.11 supports home assistant and already 3.12 is already using it, if it helps I will link python 3.11 version : https://wheels.home-assistant.io/musllinux-index/matplotlib/

dobakszilard avatar Aug 19 '24 05:08 dobakszilard