AdaptiveCpp icon indicating copy to clipboard operation
AdaptiveCpp copied to clipboard

Make OpenSYCL live at HEAD

Open aaronmondal opened this issue 3 years ago • 6 comments

Inspired by recent events (#765), (#766), (#709), I thought it would be a good idea to kick off a discussion related to upstream compatibilities for hipSYCL.

Main Point

HipSYCL is currently built and tested against various versions of Clang/LLVM (11 - 14 at the time of writing). I feel like it would make sense to add something like a "weekly build" (because nightly is overkill for the amount of commits in this repo).

I also think that it would make sense to provide a clear "current main compatibility target", and potentially set that target to "upstream".

Suggestions

  • A weekly build could use the latest version of Clang/LLVM, ROCm, CUDA. This would help detect outdated/changed APIs early so that stable hipSYCL releases are viable with newer toolchains for longer.
  • If I am not mistaken, the current linux CI pipelines all test against Ubuntu. This would probably not be a good choice for a build that tests against the latest versions of, well, anything :sweat_smile: A rolling release distro is probably wanted here, so that things like recent glibc, recent python etc are implicitly used.
  • If upstream toolchains are used for the development branch of hipSYCL, backwards compatibility may become simpler to maintain if it just means disabling unsupported features for older toolchains.
  • Instead of treating all toolchains equally, setting the "default" compatibility target for hipSYCL to upstream Clang/LLVM may provide a clearer way of handling backwards compatibility in general. Something like having the code feel like "this is my default code path, but if you use an older toolchain I have a bunch of macros ready to fall back to your older toolchain". Currently it feels more like "If you have this toolchain I'll do this. If you have that toolchain I'll do that". This makes it harder to decide whether to ENABLE new features or DISABLE old features.

Potential Issues

While I think the changes required to keep things always upstream-compatible are manageable, but I may be completely wrong here :sweat_smile:

  • CUDA seems to never change anything anyways, so probably not a lot of issues here.
  • For the CPU backends, breaking changes in Clang/LLVM can be fixed in hipSYCL.
  • The ROCm experience is horrible. Last time I checked, they used ancient LLVM versions and a codebase split across multiple codependent repos. Trivial fixes can take forever to be merged, since AMD is using different internal development branches than the ones on GitHub. Considering how many issues in hipSYCL complain about some kind of issue with ROCm, this may require an issue on its own.

Related

  • The Chromium project uses clang at "known good revisions, bumped every two weeks or so".
  • Abseil lives at HEAD.
  • We are currently using upstream Clang and friends in rules_ll. From our experience, bumping commits often (usually roughly weekly, sometimes more often thant that) actually rarely breaks builds. Most breaks are easily fixed with like 2 lines of code, since changes in LLVM are frequent, but small. Simply holding off on breaking version bump for a for a day or two is also often viable.
  • #727 is potentially related.
  • I read in various issues that there are plans to rework the CI infrastructure. Maybe it would be nice to keep track of that progress somewhere.

On our rules_ll side, we are currently using the "living at HEAD" approach for Clang/LLVM and CUDA, and will do whatever it takes to make it work with hipSYCL and ROCm :smiling_imp:

@illuhad @fodinabor I would love to hear your thoughts on this :heart:

aaronmondal avatar Jul 13 '22 20:07 aaronmondal

Thank you for your suggestion. Some thoughts regarding this:

  • Most hipSYCL users arguably are not planning to live on head. The most common hipSYCL use case is to just build it on top of your distribution's regular clang, avoiding having to compile clang/LLVM. If users don't mind compiling clang/LLVM, they can always turn to e.g. DPC++, which lives at the tip-of-the-tree.
  • That said, making HEAD clang work more reliably is always welcome, however, the projects you are referencing arguably have much more manpower to maintain upstream clang builds and investigate issues. Would you be willing to maintain this for hipSYCL?
  • For CI, Ubuntu is probably still the very best option. Latest clang can always be obtained from apt.llvm.org, if desired, and AMD provides well-supported ROCm packages for Ubuntu. We absolutely do not want to have to compile ROCm ourselves on some unsupported rolling release distribution (yes, we've been there ;) )

Instead of treating all toolchains equally, setting the "default" compatibility target for hipSYCL to upstream Clang/LLVM may provide a clearer way of handling backwards compatibility in general. Something like having the code feel like "this is my default code path, but if you use an older toolchain I have a bunch of macros ready to fall back to your older toolchain". Currently it feels more like "If you have this toolchain I'll do this. If you have that toolchain I'll do that". This makes it harder to decide whether to ENABLE new features or DISABLE old features.

I'm skeptical how big the practical implications of this would be. After all, we'd still need to have the exact same different code paths for various clang versions.

illuhad avatar Jul 18 '22 23:07 illuhad

@illuhad Thanks a lot for the feedback!

Most hipSYCL users arguably are not planning to live on head. The most common hipSYCL use case is to just build it on top of your distribution's regular clang, avoiding having to compile clang/LLVM.

Unfortunately true :smiling_face_with_tear:

If users don't mind compiling clang/LLVM, they can always turn to e.g. DPC++, which lives at the tip-of-the-tree.

We actually consider the fact that HipSYCL is not that tightly integrated into LLVM as a huge plus. DPC++s entanglement with the LLVM repo is more of a hindrance than an enhancement from a modularity perspective. Upstreaming of DPC++ into LLVM main progresses much slower than sometimes depicted. Feels like vendor lock-in all over again. From our point of view, HipSYCL is a much more portable, economical approach :heart:

That said, making HEAD clang work more reliably is always welcome, however, the projects you are referencing arguably have much more manpower to maintain upstream clang builds and investigate issues. Would you be willing to maintain this for hipSYCL?

I am relatively confident that this is viable even with limited manpower. I'll discuss with my team whether I can commit to this. We are working on this anyways, and when our CI intrastructure is done, we will run (daily?) builds and tests for HipSYCL on our servers anyways.

For CI, Ubuntu is probably still the very best option. Latest clang can always be obtained from apt.llvm.org, if desired, and AMD provides well-supported ROCm packages for Ubuntu.

Using the same OS for all CI runs limits coverage of build system related issues. It allows the build system to depend on Ubuntu specific things that shouldn't be depended upon. The fact that ~20% of open issues in the AMD HIP repo are related to the horrible build setup (statistic highly professionally acquired by searching for cmake in the issues) and issues like this and this illustrate this. Since the Ubuntu packages are more or less the only ones that work, there is probably not much that can be done until the AMD side fixes this (which is unlikely :disappointed:).

We absolutely do not want to have to compile ROCm ourselves on some unsupported rolling release distribution (yes, we've been there ;) )

Same. Horrible experience :rofl:. There may be binary distributions for archlinux available, but I think these only cover the ROCm stack without HIP. I'll to set up an exemplary CI build.

I'm skeptical how big the practical implications of this would be. After all, we'd still need to have the exact same different code paths for various clang versions.

This would probably require going over the existing macros that enable features by version compatibility and changing them to be used the same way in the entire repo. Maybe a unified features.hpp or similar that sets available features depending on the Clang/LLVM version would be an option. There would definitely be some refactoring involved. I am currently not familiar enough with the codebase to determine how much of an effort this would be. An rg LLVM_VERSION -t cpp | wc -l returns 32 occurrences.

aaronmondal avatar Jul 19 '22 05:07 aaronmondal

@illuhad OK I would be able to maintain upstream compatibility. Here would be my plan for integrating a pipelinerun that tests HEAD in hipSYCL:

  1. I will create a fork of this repo and experiment with CI pipelines until I get something that doesn't break existing workflows and wouldn't add enormous amounts of time to CI runs. I'll work on this in a fork because I do not want to hammer the hipSYCL CI system with these tests.
  2. When things work I will create PRs so that you can evaluate whether this is useful to the main hipSYCL repo.
  3. If the PRs make it into the main hipSYCL repo I will maintain compatibility with the upstream toolchains.

Additionally, I will work on the following things that are mostly independent of hipSYCL itself, but may be of interest for other hipSYCL users.

Even a HEAD pipeline for the main hipSYCL repo cannot be fully upstream, because it would require hammering the hipSYCL CI system with Clang/LLVM builds over and over again. This would be very bad for CI runtimes, and builds that are THIS upstream will break very frequently.

Since I still want to have fully source based upstream builds, I will create these more expensive pipeline runs in a separate, independent "Community Build" repo. This repo will use a different build system than the hipSYCL repo, and will most likely just act as a "coverage increaser" in the beginning. This way we can detect upcoming deprecations early and build all dependencies from source with very expensive pipeline runs, while keeping hipSYCL itself mostly as it is.

To solve the AMD issues, I will rewrite the AMD build files. On our side, we have played with this idea for a while now and have come to the conclusion that this is the only solution to work with the ROCm stack on source-based upstream CI runs. We will create an overlay on a "best effort" basis, similar to the Clang/LLVM bazel overlay in the main LLVM repo.

It is an ambitious project, but it is not unrealistic. We have many of the required parts already in place, and I believe this will be completed within this year 😊

aaronmondal avatar Jul 21 '22 16:07 aaronmondal

Yes, the AMD ROCm DevOps story is suboptimal... :-( But we can hope for some progress, even if it slow to happen.

keryell avatar Aug 04 '22 00:08 keryell

Update: It took a bit longer than originally anticipated, but hey it took less than a year 😅

I now have a seemingly functional build of the ROCm stack in rules_ll. I still need to prepare some docs before I can cut a new release, but the HIP toolchain should then be usable within the next few days.

My plan is to stabilize this HIP toolchain a bit, fix some remaining issues along the way and then move on to OpenSYCL 😊

Edit: Hahaha @keryell apparently we are interested in quite similar issues https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/54

aaronmondal avatar Mar 11 '23 10:03 aaronmondal

Edit: Hahaha @keryell apparently we are interested in quite similar issues RadeonOpenCompute/ROCm-CompilerSupport#54

This has always been a long story about how to embed device code into host code in a portable way :-), like I had on my side https://github.com/triSYCL/triSYCL/blob/master/src/triSYCL_tool.cpp

keryell avatar Mar 11 '23 18:03 keryell