~~fix regressions introduced by PR #949~~

[x] ~~PR #1038 included.~~ (merged)

misc workflows fixes

restored

restore some accidentally and not carefully merged but abandoned stuff.

[x] support cmake -B build option correctly.
[x] use python-version=3.10 for builds
[x] build a wheel with all available cuda versions.
[x] ~~use conda+mamba method. disable cuda-toolkits. (the cuda-toolkits is too slow and not stable in some cases.)~~
[x] use more flexible/customizable cmake options. (set COMPUTE_CAPABILITY for example)

fixed

[x] fix Jimver/cuda-toolkit speed issue (extracted to #1055) ~~cuda-toolkit seems too slow and even breaks sometimes. (Please see https://github.com/wkpark/bitsandbytes/actions/runs/7793281860/job/21252763742)~~
[x] fix wheel names for aarch64
[x] ~~fix docker using container: image: method.~~ use docker-run-actions
[x] update docker image version
- fixed deprecation warning for cuda12.1 docker image (cuda12.1.0 is deprecated but cuda12.1.1 is supported)

misc

[x] drop artifact retention-days: it could be configured Settings->Actions->General.
[x] concurrency: fix restored.
[x] fail-fast: false restored.
[x] use ilammy/[email protected] instead of microsoft/setup-msbuild
- microsoft/setup-msbuild is used for msbuild. ilammy/[email protected]
[x] use cmake -G Ninja : ninja is more faster and popular. (triton use Ninja for example)
[x] differently define COMPUTE_CAPABILITY
- for pull_request, use concise COMPUTE_CAPABILITY=50;52;60;61;62;70;72;75;80;86;87;89;90
- for refs/tags/*, use full capabilities, COMPUTE_CAPABILITY=61;75;86;89 (only popular GPUs included)
- with this method, build time reduced from ~40min to ~25min
[x] build cuda12.1 only for pull_request, full build for refs/tags/*

aarch64 build issue

Build time using docker + aarch64 arch seems too slow. current total build time is about 30~40min.

normal build time is about ~5 ~10min, aarch64 build on docker is about ~25 ~30min
we can use cross-compiler to fix the build speed issue.
or we can reduce COMPUTE_CAPABILITY for pull_requests

outdated or resolved ~~CPU builds with aarch64 gcc/g++ are fine, but for cuda builds there is no benefit as docker itself does not have arm64 libraries installed and it will fail to link correctly.~~ resolved.

~~Some of @rickardp's changes, whether intended or not, maybe better done in python-package.yml. However, it causes some serious problems, and it is important to quickly fix things that were fine before, I am submitting this PR.~~

~~P.S.: PR #1018 is not included.~~

Feb 05 '24 20:02 wkpark

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Feb 05 '24 20:02 github-actions[bot]

At first glance, this basically reverts a lot of Rickard's work - could you explain why the changes are needed? There also seems to be a bunch of changes that aren't really "fix regression"..?

as I already mentioned, this is a quick and manual revert to fix the current regression state.

Feb 05 '24 21:02 wkpark

The missing _cpu tag might well be a regression, but this PR seems to be putting the Mamba/Conda stuff back in (why?), adding a setuptools and wheel in the shared-libs step, etc.

Wouldn't it be better to just fix things for the better instead of a "quick revert" that someone will need to fix again later? Since there's a lot of activity in this repo just now, and we're working in a state where we don't really have the luxury to actually properly test anything, I think we should accept the fact that main might be broken.

Feb 05 '24 22:02 akx

Could you maybe look into other options than mamba? I mean, this repo itself has install_cuda.py and install_cuda.sh scripts which apparently are meant to install the CUDA Toolkit – maybe we should use those?

Feb 05 '24 23:02 akx

Some comment from me, possibly #949 was merged too quick without aligning on the scope. Sorry for not realizing this! May I suggest discussing improvements in the RFCs, maybe in #1031?

I would argue the easiest way to manage CUDA dependencies is through Docker. What is currently lacking is dependabot-managed update for the CUDA versions. I was actually planning to fix this in a follow up PR, it involves creating Dockerfiles for the CUDA images as this is what Dependabot can work wiith. We need to continuously monitor for updates as NVIDIA deprecates CUDA versions fast, and Dependabot is the realistic way to handle this. The problem here is Windows. While there seems to be some community-based Windows containers, AFAIK hosted runners won't run Windows containers. The community GitHub action does a decent job, but it's slow, owing to the fact that it runs the horribly slow CUDA installer. Some kind of image-based approach (just like Docker for the Linux builds) would be the best solution

Feb 06 '24 15:02 rickardp

Hey @wkpark, @akx, @matthewdouglas, @rickardp,

Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.

Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?

Feb 21 '24 14:02 Titus-von-Koeller

Hey @wkpark, @akx, @matthewdouglas, @rickardp,

Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.

Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?

TBH I thought this PR was abandoned and most of the stuff had been broken out to other PRs.

I'll have another look as I am not completely sure what issues remain that this PR solves

Feb 21 '24 17:02 rickardp

use ilammy/[email protected] instead of microsoft/setup-msbuild microsoft/setup-msbuild is used for msbuild. ilammy/[email protected]

concurrency: fix restored

These are already on master I think

update docker image version fixed deprecation warning for cuda12.1 docker image (cuda12.1.0 is deprecated but cuda12.1.1 is supported)

IMHO this is better to handle like #1052 so we can have Dependabot notifying when there are upgrade

support cmake -B build option correctly.

@wkpark I think you mentioned this a while back, but I don't see why this is necessary. To me it seems that it only complicates path handling in the cmake files. Source files and output are where they are anyway and intermediates are already git ignored

we can use cross-compiler to fix the build speed issue.

CUDA can't be cross compiled but we can use native aarch64 agents when they become publically available

use python-version=3.10 for builds

IMHO we should test on all versions we support. But there was an idea floated a while back (possibly by @akx) about splitting building wheels from testing them. The building of wheels is really quick so it won't speed anything up, but it will reduce the storage use.

Feb 21 '24 17:02 rickardp

fix wheel names for aarch64

drop artifact retention-days: it could be configured Settings->Actions->General.

fail-fast: false restored.

I suggest we make these separate PRs. I think retention days we want to set separately per artifact type as some artifacts (wheels) are more useful than others (.so files). But agreed these need tuning

use cmake -G Ninja : ninja is more faster and popular. (triton use Ninja for example)

Same here. But I don't know if the speed of the build tool will matter, as most of the time is spent in nvcc. And it's an additional moving part. But maybe we can isolate the change and see its impact?

Feb 21 '24 17:02 rickardp

Hey @wkpark, @akx, @matthewdouglas, @rickardp,

Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.

Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?

several PRs have been already extracted and merged. We can consider this PR as a reference or ideas, I do not intend to get it merged without further discussion or agree.

Feb 23 '24 10:02 wkpark

CI: fix workflows

misc workflows fixes

restored

fixed

misc

aarch64 build issue