xla icon indicating copy to clipboard operation
xla copied to clipboard

Explicitly export macOS x86 config setting in LLVM build file patch

Open nicholasjng opened this issue 2 years ago • 2 comments
trafficstars

Fixes a build failure in XLA on macOS ARM64, which occurred because the presently used setting "@bazel_tools//src/conditions:darwin" collides with the explicitly exported macos_arm64 setting in case of a macOS ARM64 host machine.


This addresses https://github.com/google/jax/issues/15005 , where I used this patch to fix jaxlib builds on macOS ARM64.

nicholasjng avatar Mar 19 '23 10:03 nicholasjng

Hi Nicholas,

Can you comment on how this is used or built in the JAX side? I ask because the current build of Tensorflow with this change on mac fails. Unfortunately, I can't find a publicly available link to the log, but it has failures like this:

2023-03-20 12:17:02.244956: E [tensorflow/compiler/xla/service/cpu/simple_orc_jit.cc:216](https://cs.corp.google.com/piper///depot/google3/tensorflow/compiler/xla/service/cpu/simple_orc_jit.cc?l=216&cl=517913715)] Unable to resolve runtime symbol: `sincosf'.  Hint: if the symbol a custom call target, make sure you've registered it with the JIT using XLA_CPU_REGISTER_CUSTOM_CALL_TARGET.
JIT session error: Symbols not found: [ sincosf ]
Fatal Python error: Bus error

Would this be a symptom caused where JAX is using some additional configs in the bazel build that Tensorflow is not?

tpopp avatar Mar 20 '23 15:03 tpopp

Sure, I used this patch to build jaxlib on my MacOS ARM64 machine with a local XLA repo (as described in the comment here).

I am not aware of JAX using any additional configs that TF doesn't - seems like this patch works with JAX, but breaks TF (although the log you shared seems way too "heavy" to come from changing a single build setting? What do you think?)

Turns out I was using Bazel 6.1.0, while JAX/XLA/TF are using Bazel 5.x. So there might be more at play than just the config setting clash reported in the issue above. I thought the change here was the most logical thing to do (after all, why would only the darwin setting not get its own LLVM export there?), and I succeeded in building jaxlib with this, so I thought this would work for XLA as a whole.

Seems like that is not the case, though. I closed the issue on the JAX side, since a Bazel upgrade is not of urgent priority there, I can close this as well if you like.

nicholasjng avatar Mar 20 '23 15:03 nicholasjng

Sorry for the delay on this. It took me a while to run some Tensorflow tests. Can you see if the following in toolchains.patch would work for your use case? I'm hoping it is just additions to your change that would support both your case and other tested cases.

diff --git a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
index 55064fba0bf8..8298a58a582c 100644
--- a/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
+++ b/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel
@@ -4,6 +4,7 @@
 
 load("@bazel_skylib//rules:common_settings.bzl", "string_flag")
 load("@bazel_skylib//rules:expand_template.bzl", "expand_template")
+load("@bazel_skylib//lib:selects.bzl", "selects")
 load(":tblgen.bzl", "gentbl")
 load(":config.bzl", "llvm_config_defines")
 load(":targets.bzl", "llvm_targets")
@@ -19,6 +20,58 @@ licenses(["notice"])
 
 exports_files(["LICENSE.TXT"])
 
+
+config_setting(
+    name = "macos_arm64",
+    values = {
+        "apple_platform_type": "macos",
+        "cpu": "darwin_arm64",
+    },
+)
+# Sometimes Bazel reports darwin_x86_64 as "darwin" and sometimes as
+# "darwin_x86_64". The former shows up when building on a Mac x86_64 host for a Mac x86_64 target.
+# The latter shows up when cross-compiling for Mac x86_64 from a Mac ARM machine and in internal
+# Google builds.
+config_setting(
+    name = "macos_x86_64_default",
+    values = {
+        "apple_platform_type": "macos",
+        "cpu": "darwin",
+    },
+)
+
+config_setting(
+    name = "macos_x86_64_crosscompile",
+    values = {
+        "apple_platform_type": "macos",
+        "cpu": "darwin_x86_64",
+    },
+)
+
+selects.config_setting_group(
+    name = "macos_x86_64",
+    match_any = [
+        ":macos_x86_64_default",
+        ":macos_x86_64_crosscompile",
+    ],
+    visibility = ["//visibility:public"],
+)
+
+config_setting(
+    name = "linux_aarch64",
+    values = {"cpu": "aarch64"},
+)
+
+config_setting(
+    name = "linux_ppc64le",
+    values = {"cpu": "ppc"},
+)
+
+config_setting(
+    name = "linux_s390x",
+    values = {"cpu": "s390x"},
+)
+
 # It may be tempting to add compiler flags here, but that should be avoided.
 # The necessary warnings and other compile flags should be provided by the
 # toolchain or the `.bazelrc` file. This is just a workaround until we have a
diff --git a/utils/bazel/llvm-project-overlay/llvm/config.bzl b/utils/bazel/llvm-project-overlay/llvm/config.bzl
index 5507f80efa0b..14fea8a019e4 100644
--- a/utils/bazel/llvm-project-overlay/llvm/config.bzl
+++ b/utils/bazel/llvm-project-overlay/llvm/config.bzl
@@ -90,11 +90,12 @@ os_defines = select({
 # TODO: We should split out host vs. target here.
 llvm_config_defines = os_defines + select({
     "@bazel_tools//src/conditions:windows": native_arch_defines("X86", "x86_64-pc-win32"),
-    "@bazel_tools//src/conditions:darwin_arm64": native_arch_defines("AArch64", "arm64-apple-darwin"),
-    "@bazel_tools//src/conditions:darwin_x86_64": native_arch_defines("X86", "x86_64-unknown-darwin"),
-    "@bazel_tools//src/conditions:linux_aarch64": native_arch_defines("AArch64", "aarch64-unknown-linux-gnu"),
-    "@bazel_tools//src/conditions:linux_ppc64le": native_arch_defines("PowerPC", "powerpc64le-unknown-linux-gnu"),
-    "@bazel_tools//src/conditions:linux_s390x": native_arch_defines("SystemZ", "systemz-unknown-linux_gnu"),
+    "//llvm:macos_arm64": native_arch_defines("AArch64", "arm64-apple-darwin"),
+    "//llvm:macos_x86_64": native_arch_defines("X86", "x86_64-unknown-darwin"),
+    "@bazel_tools//src/conditions:darwin": native_arch_defines("X86", "x86_64-unknown-darwin"),
+    "//llvm:linux_aarch64": native_arch_defines("AArch64", "aarch64-unknown-linux-gnu"),
+    "//llvm:linux_ppc64le": native_arch_defines("PowerPC", "powerpc64le-unknown-linux-gnu"),
+    "//llvm:linux_s390x": native_arch_defines("SystemZ", "systemz-unknown-linux_gnu"),
     "//conditions:default": native_arch_defines("X86", "x86_64-unknown-linux-gnu"),
 }) + [
     "LLVM_VERSION_MAJOR={}".format(LLVM_VERSION_MAJOR),

tpopp avatar Mar 22 '23 09:03 tpopp

Hey, thanks for coming back to me on this! Unfortunately, the change errors out again with the same message (collision between @llvm-project//llvm:macos_arm64 and @bazel_tools//src/conditions:darwin), since it reverts back to @bazel_tools//src/conditions:darwin in the last hunk. Is that intentional?

nicholasjng avatar Mar 22 '23 09:03 nicholasjng

Unfortunately, it was needed for building for iOS, but I will confirm I didn't make another mistake. Out of curiosity, have you tried making the file empty and seeing if that works? This file was originally created due to TF issues, so potentially it is not needed at all for XLA/JAX.

tpopp avatar Mar 22 '23 09:03 tpopp

Good hunch, that works! Sometimes the easiest solutions are the best ones. How do we go about this then? Unbreaking XLA/JAX builds for newer Bazel still seems worthwhile to me.

nicholasjng avatar Mar 22 '23 10:03 nicholasjng

I will follow up on this soon and add appropriate people. Tensorflow folks (who are still the largest contributors to XLA infrastructure) have already been working on upgrading to a newer version of bazel, including solving failures in their Mac builds.

I had overlooked that you came across this issue with bazel 6, but now realize that they probably already know roughly or exactly what is needed for the tensorflow side.

tpopp avatar Mar 22 '23 10:03 tpopp

It looks like I was mistaken that Tensorflow still needs this either unless I did something wrong. If you want to update your PR to remove this completely, I can shepherd it through, so you get credit.

tpopp avatar Mar 30 '23 09:03 tpopp

Sure, give me a minute. Remove as in remove the files, or leave them empty? Asking because some of the patches have a "do not delete even if empty" note, although this one doesn't.

nicholasjng avatar Mar 30 '23 10:03 nicholasjng

You can fully remove it as long as you remove the reference to the patch file like here: https://github.com/openxla/xla/blob/main/third_party/llvm/workspace.bzl#L23

tpopp avatar Mar 30 '23 11:03 tpopp

Done!

nicholasjng avatar Mar 30 '23 11:03 nicholasjng

So unfortunately, do to the way we mirror code between Google and this github repository, i need to create a separate commit to do this. I will try to accredit it to you though, and update here once it is landed

tpopp avatar Apr 03 '23 15:04 tpopp

https://github.com/openxla/xla/commit/a3e20fbb13887320f756c27b4f1ddae12e3e630c landed this with a bunch of other formatting changes for some reason

tpopp avatar Apr 03 '23 16:04 tpopp

I should have suggested you add yourself as author too while you were at it, but you were too quick on the draw ;)

Thanks! Really appreciate the followups and continued updates, even though most of the logs were not openly accessible.

nicholasjng avatar Apr 03 '23 17:04 nicholasjng