CI: fix workflows
~~fix regressions introduced by PR #949~~
- [x] ~~PR #1038 included.~~ (merged)
misc workflows fixes
restored
restore some accidentally and not carefully merged but abandoned stuff.
- [x] support cmake
-B buildoption correctly. - [x] use python-version=3.10 for builds
- [x] build a wheel with all available cuda versions.
- [x] ~~use conda+mamba method. disable cuda-toolkits. (the cuda-toolkits is too slow and not stable in some cases.)~~
- [x] use more flexible/customizable
cmakeoptions. (setCOMPUTE_CAPABILITYfor example)
fixed
- [x] fix Jimver/cuda-toolkit speed issue (extracted to #1055) ~~cuda-toolkit seems too slow and even breaks sometimes. (Please see https://github.com/wkpark/bitsandbytes/actions/runs/7793281860/job/21252763742)~~
- [x] fix wheel names for aarch64
- [x] ~~fix docker using
container: image:method.~~ use docker-run-actions - [x] update docker image version
- fixed deprecation warning for cuda12.1 docker image (cuda12.1.0 is deprecated but cuda12.1.1 is supported)
misc
- [x] drop artifact
retention-days:it could be configured Settings->Actions->General. - [x]
concurrency:fix restored. - [x]
fail-fast: falserestored. - [x] use
ilammy/[email protected]instead ofmicrosoft/setup-msbuildmicrosoft/setup-msbuildis used for msbuild.ilammy/[email protected]
- [x] use
cmake -G Ninja: ninja is more faster and popular. (triton use Ninja for example) - [x] differently define
COMPUTE_CAPABILITY- for
pull_request, use conciseCOMPUTE_CAPABILITY=50;52;60;61;62;70;72;75;80;86;87;89;90 - for
refs/tags/*, use full capabilities,COMPUTE_CAPABILITY=61;75;86;89(only popular GPUs included) - with this method, build time reduced from ~40min to ~25min
- for
- [x] build cuda12.1 only for pull_request, full build for
refs/tags/*
aarch64 build issue
Build time using docker + aarch64 arch seems too slow. current total build time is about 30~40min.
- normal build time is about ~5 ~10min, aarch64 build on docker is about ~25 ~30min
- we can use cross-compiler to fix the build speed issue.
- or we can reduce
COMPUTE_CAPABILITYforpull_requests
outdated or resolved ~~CPU builds with aarch64 gcc/g++ are fine, but for cuda builds there is no benefit as docker itself does not have arm64 libraries installed and it will fail to link correctly.~~ resolved.
~~Some of @rickardp's changes, whether intended or not, maybe better done in python-package.yml. However, it causes some serious problems, and it is important to quickly fix things that were fine before, I am submitting this PR.~~
~~P.S.: PR #1018 is not included.~~
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
At first glance, this basically reverts a lot of Rickard's work - could you explain why the changes are needed? There also seems to be a bunch of changes that aren't really "fix regression"..?
as I already mentioned, this is a quick and manual revert to fix the current regression state.
The missing _cpu tag might well be a regression, but this PR seems to be putting the Mamba/Conda stuff back in (why?), adding a setuptools and wheel in the shared-libs step, etc.
Wouldn't it be better to just fix things for the better instead of a "quick revert" that someone will need to fix again later? Since there's a lot of activity in this repo just now, and we're working in a state where we don't really have the luxury to actually properly test anything, I think we should accept the fact that main might be broken.
Could you maybe look into other options than mamba? I mean, this repo itself has install_cuda.py and install_cuda.sh scripts which apparently are meant to install the CUDA Toolkit – maybe we should use those?
Some comment from me, possibly #949 was merged too quick without aligning on the scope. Sorry for not realizing this! May I suggest discussing improvements in the RFCs, maybe in #1031?
I would argue the easiest way to manage CUDA dependencies is through Docker. What is currently lacking is dependabot-managed update for the CUDA versions. I was actually planning to fix this in a follow up PR, it involves creating Dockerfiles for the CUDA images as this is what Dependabot can work wiith. We need to continuously monitor for updates as NVIDIA deprecates CUDA versions fast, and Dependabot is the realistic way to handle this. The problem here is Windows. While there seems to be some community-based Windows containers, AFAIK hosted runners won't run Windows containers. The community GitHub action does a decent job, but it's slow, owing to the fact that it runs the horribly slow CUDA installer. Some kind of image-based approach (just like Docker for the Linux builds) would be the best solution
Hey @wkpark, @akx, @matthewdouglas, @rickardp,
Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.
Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?
Hey @wkpark, @akx, @matthewdouglas, @rickardp,
Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.
Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?
TBH I thought this PR was abandoned and most of the stuff had been broken out to other PRs.
I'll have another look as I am not completely sure what issues remain that this PR solves
use ilammy/[email protected] instead of microsoft/setup-msbuild microsoft/setup-msbuild is used for msbuild. ilammy/[email protected]
concurrency: fix restored
These are already on master I think
update docker image version fixed deprecation warning for cuda12.1 docker image (cuda12.1.0 is deprecated but cuda12.1.1 is supported)
IMHO this is better to handle like #1052 so we can have Dependabot notifying when there are upgrade
support cmake -B build option correctly.
@wkpark I think you mentioned this a while back, but I don't see why this is necessary. To me it seems that it only complicates path handling in the cmake files. Source files and output are where they are anyway and intermediates are already git ignored
we can use cross-compiler to fix the build speed issue.
CUDA can't be cross compiled but we can use native aarch64 agents when they become publically available
use python-version=3.10 for builds
IMHO we should test on all versions we support. But there was an idea floated a while back (possibly by @akx) about splitting building wheels from testing them. The building of wheels is really quick so it won't speed anything up, but it will reduce the storage use.
fix wheel names for aarch64
drop artifact retention-days: it could be configured Settings->Actions->General.
fail-fast: false restored.
I suggest we make these separate PRs. I think retention days we want to set separately per artifact type as some artifacts (wheels) are more useful than others (.so files). But agreed these need tuning
use cmake -G Ninja : ninja is more faster and popular. (triton use Ninja for example)
Same here. But I don't know if the speed of the build tool will matter, as most of the time is spent in nvcc. And it's an additional moving part. But maybe we can isolate the change and see its impact?
Hey @wkpark, @akx, @matthewdouglas, @rickardp,
Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.
Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?
several PRs have been already extracted and merged. We can consider this PR as a reference or ideas, I do not intend to get it merged without further discussion or agree.