iree feat: enable cuda on arm for jetson boards

ci-extra: build_test_all_arm64

Jun 04 '24 08:06 maxbartel

@ScottTodd is it possible to run build_test_all_arm64 on this PR? It was skipped. The runtime seems to include CUDA now https://github.com/iree-org/iree/actions/runs/9363507812/job/25774464868?pr=17564#step:9:163

Jun 04 '24 08:06 maxbartel

@ScottTodd is it possible to run build_test_all_arm64 on this PR? It was skipped. The runtime seems to include CUDA now https://github.com/iree-org/iree/actions/runs/9363507812/job/25774464868?pr=17564#step:9:163

https://iree.dev/developers/general/contributing/#ci-behavior-manipulation use ci-extra (or ci-exactly) then push a commit to retrigger

Jun 04 '24 13:06 ScottTodd

Nice, looks like this worked. I see CUDA enabled in https://github.com/iree-org/iree/actions/runs/9371176047/job/25799891554?pr=17564 and some compiler tests with no runtime / hardware requirements using it ran and passed.

Jun 04 '24 20:06 ScottTodd

note that this may edge into non-default path territory - downloading/compiling cuda stuff for one particular board vs the more common arm systems is a big addition to the default flow (iow, this makes all arm users download/build cuda, whereas only jetson-like users may care)

Jun 04 '24 20:06 benvanik

Yeah, it's even trickier because we check for NOT IREE_CUDA_AVAILABLE OR CMAKE_CROSSCOMPILING for changing the default - so cross-compiling for jetson won't have CUDA

I'm going to suggest such users enable the flag and don't rely on the default. Given the current status of the CUDA target being opt-in for unsupported platforms is safer.

Jun 04 '24 20:06 benvanik

The motivation for this was here on Discord

Compiling IREE on the jetson board with 4gb of RAM was not really fun...

The compiler doesn't necessarily need to run on the board itself, so I'm not sure if we'd need to enable the cuda compiler for our package builds to address the user pain there 🤔

Jun 04 '24 20:06 ScottTodd

The only thing IREE_CUDA_AVAILABLE does is change the default flags for compiler and runtime backend inclusion. I don't see why we'd want to change the default for all source builds to add something to a release package. Whatever is building the package should do it - we want packages to have explicitly included things, not rely on defaults.

Jun 04 '24 20:06 benvanik

Hmmm... points taken. I'm on the fence here. If we just want to opt the release build in to including CUDA and leave developer builds stuck with the previous behavior (detected as not available -> defaulted to off), we could pull some changes back from https://github.com/iree-org/iree/pull/14611

# build_tools/python_deploy/build_linux_packages.sh
function build_iree_compiler() {
# (always enable? opt-in based on arch?)
+ IREE_TARGET_BACKEND_CUDA=$(uname -m | awk '{print ($1 == "x86_64") ? "ON" : "OFF"}') \
  build_wheel compiler/
}

# compiler/setup.py
            "-DPython3_EXECUTABLE={}".format(sys.executable),
            "-DCMAKE_BUILD_TYPE={}".format(cfg),
+           get_env_cmake_option("IREE_TARGET_BACKEND_CUDA"),

Jun 04 '24 20:06 ScottTodd

I get the arguments about not building CUDA for every aarch64 source build, but only when building the packages. I will update the PR with the pointer from @ScottTodd.

note that this may edge into non-default path territory - downloading/compiling cuda stuff for one particular board vs the more common arm systems is a big addition to the default flow (iow, this makes all arm users download/build cuda, whereas only jetson-like users may care)

To be fair, Jetson boards are super common in the edge AI space. There are also ARM workstations coming on the market that seem to support external GPUs. So having CUDA at least in the AArch64 package would be nice.

Jun 05 '24 07:06 maxbartel

Hmmm... points taken. I'm on the fence here. If we just want to opt the release build in to including CUDA and leave developer builds stuck with the previous behavior (detected as not available -> defaulted to off), we could pull some changes back from #14611
# build_tools/python_deploy/build_linux_packages.sh
function build_iree_compiler() {
# (always enable? opt-in based on arch?)
+ IREE_TARGET_BACKEND_CUDA=$(uname -m | awk '{print ($1 == "x86_64") ? "ON" : "OFF"}') \
  build_wheel compiler/
}

# compiler/setup.py
            "-DPython3_EXECUTABLE={}".format(sys.executable),
            "-DCMAKE_BUILD_TYPE={}".format(cfg),
+           get_env_cmake_option("IREE_TARGET_BACKEND_CUDA"),

I looked into this and we would still need IREE_CUDA_AVAILABLE to be able to build the CUDA backend because of

# Supported default target backends that are only available on certain
# platforms.
set(IREE_TARGET_BACKEND_CUDA_DEFAULT ${IREE_TARGET_BACKEND_DEFAULTS})
if(NOT IREE_CUDA_AVAILABLE)
  set(IREE_TARGET_BACKEND_CUDA_DEFAULT OFF)
endif()
cmake_dependent_option(IREE_TARGET_BACKEND_CUDA "Enables the 'cuda' compiler target backend" ${IREE_TARGET_BACKEND_CUDA_DEFAULT} ${IREE_BUILD_COMPILER} OFF)

in the main CMakeLists.txt.

Reading trough that file in a bit more detail, I think having CUDA disabled on ARM is actually the inconsistent behavior. The main reasons for that:

It is oftentimes stated that CUDA can/will be downloaded when on a supported platform and AArch64 has CUDA support. See:
- https://github.com/iree-org/iree/blob/58feff319e2fd0dff7909358741a26ffa5807823/CMakeLists.txt#L200-L206
- https://github.com/iree-org/iree/blob/58feff319e2fd0dff7909358741a26ffa5807823/CMakeLists.txt#L215-L216
- https://github.com/iree-org/iree/blob/58feff319e2fd0dff7909358741a26ffa5807823/CMakeLists.txt#L263-L265
- https://github.com/iree-org/iree/blob/58feff319e2fd0dff7909358741a26ffa5807823/CMakeLists.txt#L382-L388
The documentation on https://iree.dev/guides/deployment-configurations/gpu-cuda/#prerequisites says The core iree-compiler package includes the CUDA compiler: (note that this cannot be true for MacOS, I added clarification for this)
The ROCM/HIP backend and driver is also enabled on AArch64 even if there are very few systems supporting it. I don't see a big difference here between CUDA and ROCM.

I will therefore mark this PR as ready for review.

Jun 07 '24 12:06 maxbartel

I looked into this and we would still need IREE_CUDA_AVAILABLE to be able to build the CUDA backend because of

All that controls is the default value? You should still be able to enable it if the "available" check fails?

The ROCM/HIP backend and driver is also enabled on AArch64 even if there are very few systems supporting it. I don't see a big difference here between CUDA and ROCM.

I agree there - the logic and policies should be pretty similar.

Jun 07 '24 14:06 ScottTodd

ll that controls is the default value? You should still be able to enable it if the "available" check fails?

Ah I think I misread https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html (especially the part with exposing it to the user). I think it should work with IREE_HAL_DRIVER_CUDA and IREE_TARGET_BACKEND_CUDA in build_tools/python_deploy/build_linux_packages.sh

So which path should we go? I would prefer the one in this PR, but I am also open to do this only for the packages. If we go that route, we should probably change rocm to the same behavior in a follow-up PR. @ScottTodd

Jun 07 '24 15:06 maxbartel

I have no idea why ROCM got enabled by default on arm64 - that's silly :) We should have ROCM track what the default for CUDA is and then require users to opt-in if they want the backends that require SDKs. Now, if ROCM/CUDA only require headers that's a different thing, but so long as they require SDKs we have to be careful - we don't want to become tf by bloating things out for all users by default, afterall. It's a very slippery slope to "all backends enabled all the time regardless of usage" - most users doing source builds won't be targeting all backends and adding a cmake option clearly referenced in the target-specific getting started guide is the easiest part of doing a source build.

Jun 07 '24 15:06 benvanik

I can do that. I will move CUDA to that behavior in this PR then and try to move rocm in a follow-up. We probably also need to change some comments (for example the ones I posted) and maybe some docs.

"That behavior" meaning that it is disabled by default on ARM (maybe even all?) builds, but the packages include them.

Does that sound about right @benvanik @ScottTodd ?

Jun 07 '24 15:06 maxbartel

The discussion here bounced around a bit. Let me try to collect my thoughts...

In PRs like https://github.com/iree-org/iree/pull/11976 and https://github.com/iree-org/iree/pull/14611, we moved towards more robust build system logic for getting prerequisite files from either the system or from an automatic download for CUDA. Once the build system logic was reliable enough, we enabled CUDA by default on all supported platforms and changed release scripts from being opt-in to just relying on the new defaults.

I haven't looked as closely at the ROCm/HIP side, but there is some logic in compiler/plugins/target/ROCM/CMakeLists.txt to download some .bc files. That doesn't look like a full SDK in the way that CUDA has one.

Enabling compiler target backends and runtime HAL drivers by default (downloading deps automatically as needed) on supported platforms has the advantage of developer builds, CI test builds, and CI release builds all using the same settings. It also has the disadvantage of increasing build time, binary size, and overall complexity for users that don't need those features.

Some key metric to optimize for in build system configuration are predictability and convenience. We could make the decision to have hardware backends be disabled by default (predictable), or we could introduce more conditional checks based on platforms where the community gets the most value on average (convenience).

I'm still not sure which outcome I'd like here. The automatic downloading is pretty robust and I think the costs of being enabled by default are pretty low.

Jun 10 '24 17:06 ScottTodd

I think regardless of choice we need to not tie defaults to what the CI/bots/packaging systems do - those should never use defaults. The big red flag here for me is that changing a source build default (the only thing the defaults should be used for) is being used to control packaging. The only bots that should use defaults are smoketest bots for user flows ("if a user checks out a build with no settings, can we build?").

So completely split the argument: CI/packaging/etc uses no defaults at all, and all we're talking about is defaults for source builds. If source builds require a few small deps then I care less about what the default is - but I do reiterate that it's a slippery slope in terms of what we build by default. Many of us are on 64-96 core machines with 128+GB of RAM. Not all users are. Consider first exposure: you check out IREE and build the first time and it takes a very long time building files you don't care about and a lot more harddrive space than you'd expect - are you going to go dig through CMakeLists.txt to find which settings to disable, or are you going to call it bloatware like tf? I know the answer :) Now, if you're a first-time user and want to target CUDA specifically you should hit the getting started CUDA guide which has the right settings (you need to hit this guide anyway to set compiler flags, so it's on the path), or check the cmake output saying "CUDA DISABLED, use IREE_BLAHBLAH to enable," etc.

Jun 10 '24 17:06 benvanik

Sorry, sent without the last sentence: we can enable a lightweight rocm by default in source builds if everyone thinks that's best - I don't, but I'm just one vote :) I do think we need to not enable it for making packages include rocm - those should be set explicitly as part of the packaging setp.

Jun 10 '24 17:06 benvanik

Switching at least the package builds to explicitly enable features SGTM. For CI builds, I'll be okay with that if we at least check that the default configuration can configure without errors - more ideas relating to that here: https://github.com/iree-org/iree/issues/17136#issuecomment-2145579825.

Jun 10 '24 17:06 ScottTodd

@benvanik I updated the PR to only include CUDA for aarch64 when building the package. I would like to post a PSA on the email list before changing the default build behavior and would like to introduce that in a follow up PR. WDYT?

Jul 01 '24 06:07 maxbartel

@benvanik Could I get another review/opinion on this PR please? :)

Jul 10 '24 07:07 maxbartel

As someone who has tried at various points to build with everything disabled, I think it would be better for that to be the default in the source. And then we have some documented/checked in way to enable the kitchen sink and that is used consistently in CI and packaging.

On the cuda/hip front, neither should require the SDK but cuda is currently downloading a ton of stuff to extract the one file it needs (because licensing / none of that stuff is open source). So I think hip is in a better situation here. I don't think either should be enabled in a default build from source config (and neither should other things).

Jul 10 '24 15:07 stellaraccident

Sounds like we found consensus then. However it doesn't feel like this PR is the correct place to make this change. I would create a follow up PR with a PSA on the email list to disable HIP and CUDA in a default build. Otherwise it could confuse some folks who always used the default build for CUDA/HIP. It would have confused me for sure...

This PR would then still be about enabling CUDA for the iree python packages for arm. Could I get a review from that perspective in mind? 🙂

Jul 10 '24 15:07 maxbartel

The incremental nature of this change is giving me some pause. I don't want to hold this up for too long, but it sounds to me like making the changes across all backends + build configurations first would be easier to follow.

Jul 10 '24 22:07 ScottTodd

Can we close this now that https://github.com/iree-org/iree/pull/17996 and https://github.com/iree-org/iree/pull/18438 are merged?

Sep 05 '24 17:09 ScottTodd