heir Fix "Integrate LLVM" breaking main

The (at the time of writing) most recent LLVM Integrate commit (497630d) fails to build due to an error in LLVM (action output):

external/llvm-project/mlir/lib/Conversion/ArithToEmitC/ArithToEmitC.cpp:18:10: fatal error: mlir/Dialect/EmitC/Transforms/TypeConversions.h: No such file or directory
   18 | #include "mlir/Dialect/EmitC/Transforms/TypeConversions.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

I'm actually not entirely sure what's going on here, as the file still seems to exist upstream? Maybe an issue with the bazel overlay? Either way, it'd be nice if there was a way to set it up so that the system does not Integrate LLVM if it fails our CI and instead merely create a PR/issue.

Jul 11 '24 05:07 AlexanderViand-Intel

Was this somehow fixed in https://github.com/google/heir/commit/72f06750e7170477e7ffcce603710775244e4018?

I think the issue might be that internally since we are not a production issue our build failures don't block submission of their integrations, but I can block copybara from updating the LLVM hash. I'll get that in today!

Edit: I don't really see why that would be only blocking us. That's weird. I'll dig into that internal PR and see what happened.

Jul 11 '24 12:07 asraa

It looks like someone had forced submitted that integrate PR - what happened was the external Bazel LLVM BUILD file wasn't updated properly and missed that dependency.

Solution is still to block copybara for that! WIP

Jul 11 '24 12:07 asraa

It should be blocking LLVM integrate, but we can't stop a force submit in general. I think it will just be an occasional hazard to deal with.

On Thu, Jul 11, 2024, 5:59 AM asraa @.***> wrote:

It looks like someone had forced submitted that integrate PR - what happened was the external Bazel LLVM BUILD file wasn't updated properly and missed that dependency.

Solution is still to block copybara for that! WIP

— Reply to this email directly, view it on GitHub https://github.com/google/heir/issues/786#issuecomment-2222878042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS2PKQMJZF6XPUZZNIGUB3ZLZ6UXAVCNFSM6AAAAABKWHQ4BWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRSHA3TQMBUGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Jul 11 '24 13:07 j2kun

Checking back in: did you figure out a way to prevent this in the future, or do we just live with it?

Jul 31 '24 10:07 AlexanderViand-Intel

I talked to the MLIR rotation folks, who said that this could happen again since the branch they integrate upstream changes to can locally patch the BUILD files before upstream LLVM reconciles them.

Then the question is if copybara can delay posting the LLVM SHA updates until CI is green - like @j2kun said I don't think that's really possible with the way copybara works right now. Probably it would only be possible if we had some kind of staged branch that included a check on the open source build (and somehow had an easy way to keep it in sync).

I think right now we have to do patch maintenance when that does happen, and I can add some docs for how to do that. Hopefully it's rare enough that the upstream fix isn't integrated at the same time as the breakage :/

Jul 31 '24 14:07 asraa

Thanks for the update! Since it does seem to be pretty rare, I guess it's not worth worrying about too much :)

Jul 31 '24 14:07 AlexanderViand-Intel

...and once again, the most recent LLVM integrate is for a commit with a borked bazel overlay...

Aug 16 '24 07:08 AlexanderViand-Intel

Found them talking about this on the internal rotation chat: cause: https://github.com/llvm/llvm-project/pull/102494 fix: https://github.com/llvm/llvm-project/pull/104054

Looks like they just posted some new rotation docs, so going to go understand what the new process is for fixing bazel builds in a second, but going to patch our build for now

Aug 16 '24 11:08 asraa

@asraa I'm guessing it's another case of a patch not having made it to us yet?

llvm-project/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp:20:10: error: module @llvm-project//mlir:VectorToSPIRV does not depend on a module exporting 'mlir/Dialect/Utils/StaticValueUtils.h' #include "mlir/Dialect/Utils/StaticValueUtils.h"

Nov 07 '24 06:11 AlexanderViand-Intel

An internal dev decided to force-push a broken change. Sigh.

Nov 07 '24 15:11 j2kun

Yeah I got a message over night from someone asking if they could skip our presubmit.

But, they then said they ran the workflow manually and it succeeded so I don't know if that's a red herring or not...

EDIT: I think the workflow run that I see that succeeded was just the export, not the build.

Nov 07 '24 15:11 asraa

Anyway, patch incoming!

Nov 07 '24 15:11 asraa

It was fixed upstream in a few patches https://github.com/llvm/llvm-project/commits/main/utils/bazel/llvm-project-overlay/mlir/BUILD.bazel. We could just wait for the next integrate CL.

Nov 07 '24 15:11 j2kun

Yeah, I found the patch fix - but we don't know if it will take 3-4 days to integrate

Nov 07 '24 15:11 asraa

It's Thursday, we can push the integrate ourselves if need be. I'll go investigate once I'm in the office.

Nov 07 '24 15:11 j2kun

An update: I've figured out how other projects manage this at Google. The old situation was that if a bazel build broke some Google project, they'd fix it up stream and re-integrate up to that fix. Since July or so the new process is to make the changes upstream and cherry-pick them to an internal set of patches. Then projects like Tensorflow that need to keep those patches in their GitHub trees have some tooling to export them.

I think all we need to do is clone Tensorflow's patch and apply it during our bazel workspace instantiation. It would also be prudent to include the patch application step in the CMake build, if that's easy. The visible change we'll see is that LLVM integration commits will come with a delta to a (new) patch file. I'll put that in a new top-level directory called patches in case we need to add any other general patches to dependent libraries.

Nov 08 '24 17:11 j2kun

Should be fixed by https://github.com/google/heir/pull/1080

Nov 08 '24 18:11 j2kun

I think all we need to do is clone Tensorflow's patch and apply it during our bazel workspace instantiation. It would also be prudent to include the patch application step in the CMake build, if that's easy. The visible change we'll see is that LLVM integration commits will come with a delta to a (new) patch file. I'll put that in a new top-level directory called patches in case we need to add any other general patches to dependent libraries.

❤ This is great, thanks! In CMake, the hard part probably isn't applying the patch (just a custom command) but the general switch from assuming a pre-built MLIR to having CMake download and build llvm😅

Nov 08 '24 20:11 AlexanderViand-Intel

Given that the majority of the patch contents are bazel build file changes, we can probably get away with ignoring the patches in the cmake build the vast majority of the time.

Nov 08 '24 20:11 j2kun