ring icon indicating copy to clipboard operation
ring copied to clipboard

build: Only replace `cl.exe` with `clang-cl` for ARM64 Windows builds

Open MarijnS95 opened this issue 1 year ago • 13 comments

Fixes #2215

[!CAUTION] This change depends on https://github.com/rust-lang/cc-rs/pull/1357.

When cross-compiling to Windows from Linux or similar, it's common to use the clang-cl driver from the LLVM toolchain which supports parsing cl.exe-like arguments.

Ring however doesn't seem to compile for ARM64 Windows using cl.exe, and contains a // FIXME-style workaround to use clang to compile its C files instead.

The command-line interface for clang isn't always compatible with that of cl.exe and clang-cl. There didn't seem to be any trouble with this yet, but when cross-compiling from Linux it's common to explicitly provide "sysroots" via -vctoolsdir and -winsdkdir in CFLAGS. In such a setup this workaround in ring would pass those arguments to clang, resulting in "unknown argument" errors.

cc-rs can tell us exactly what compiler it found, and we can use this information to decide how to fill in this workaround. If the user was already compiling their code with clang-cl, nothing has to be replaced. In the end, all this entails is changing the workaround to not compile with clang, but with clang-cl instead.

MarijnS95 avatar Jan 10 '25 13:01 MarijnS95

In the commit message:

build: Don't overwrite clang-cl with clang for ARM64 Windows builds

here and elsewhere you use the word "overwrite" when I think "override" is the better word.

When ring however overwrites this compiler with clang,

I suggest instead we say "When this is done,".

existing user arguments in i.e. CFLAGS are not compatible resulting in various "unknown argument" errors such as for -vctoolsdir and -winsdkdir.

Most likely, we need to modify our compiler flag handling in build.rs to pass the correct flags to clang-cl.

As pointed out in the original ARM64 Windows support PR, compiling ring with clang-cl works equally well as clang: allow that by not overwriting the compiler with clang if it's already clang-cl, to not have to sort out incompatible arguments.

IIRC, when I tried it, clang-cl couldn't compile the assembly source files, which is why we hard-code clang.

We need to modify ci.yml to add the clang-cl-based configuration.

I also think that, assuming it works "equally well," we should default to the clang-cl-based build (instead of clang) when we detect the C compiler is MSVC and the target is aarch64.

briansmith avatar Jan 10 '25 18:01 briansmith

In ci.yml, we have (twice):

      - if: ${{ matrix.target == 'aarch64-pc-windows-msvc' }}
        run: |
          echo "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\Llvm\x64\bin" >> $GITHUB_PATH

When using clang-cl, is this still necessary, or is clang-cl already in the path in GitHub actions?

If clang-cl is more common and clang-cl is already in the path, then we should remove this from ci.yml.

This PR is solely about supporting a cross-compile from Linux (for #2215), and I have no clue about a "native" cross-compile from Windows->Windows (barring the architecture). If you want to know this for sure, I'd have to boot into a Windows machine for some experimentation.

It seems very unlikely that clang-cl is available on PATH any more than cl.exe; it is a specific tool to get a cl.exe-like command line interface while using the LLVM compiler stack.

We should also update BUILDING.md "For Windows ARM64 targets..." to describe the updated situation.

I don't think much is changing, or supposed to be changed here. All this PR really is about, is making the constraints for this // FIXME workaround more strict so that it doesn't replace the requested/provided compiler too eagerly unless absolutely deemed necessary.

As above, if we're "blindly" replacing the compiler command with "clang" without ever looking at the compiler-specific arguments and whether they're compatible, we better isolate such workarounds thoroughly.

The best course of action would be if this project could natively compile for ARM64 with cl.exe so that none of this "compiler-command replacing" is necessary, though I didn't read up on how feasible that would be.

MarijnS95 avatar Jan 10 '25 18:01 MarijnS95

In the commit message:

build: Don't overwrite clang-cl with clang for ARM64 Windows builds

here and elsewhere you use the word "overwrite" when I think "override" is the better word.

Agreed. It doesn't overwrite the compiler, but it does "overwrite" or "overrule" a CC environment flag or similar in the users' environment.

When ring however overwrites this compiler with clang,

I suggest instead we say "When this is done,".

NAK. I didn't introduce the idea of "overwrite the compiler with clang" yet before this part of the paragraph, so this would refer to nothing.

existing user arguments in i.e. CFLAGS are not compatible resulting in various "unknown argument" errors such as for -vctoolsdir and -winsdkdir.

Most likely, we need to modify our compiler flag handling in build.rs to pass the correct flags to clang-cl.

As above, if we assume that user flags are more likely compatible with cl.exe than with clang, using clang-cl is the better option than picking clang.

I do think we can be more strict here. We're going to expose the .family() flag in cc-rs, so I'm thinking this:

match compiler.family() {
    // The workaround
    ToolFamily::Msvc { clang_cl: false } => c.compiler("clang-cl"),
    ToolFamily::Msvc { clang_cl: true } | ToolFamily::Clang { .. } => {},
    ToolFamily::Gnu { .. } => todo!("Test this"),
}

As pointed out in the original ARM64 Windows support PR, compiling ring with clang-cl works equally well as clang: allow that by not overwriting the compiler with clang if it's already clang-cl, to not have to sort out incompatible arguments.

IIRC, when I tried it, clang-cl couldn't compile the assembly source files, which is why we hard-code clang.

Curious what that entails. For me it failed only because of warnings forced as errors in the git tree: https://github.com/briansmith/ring/blob/d2e401ff0b5bf0d056e553075e16458b0d8a882b/build.rs#L314

We need to modify ci.yml to add the clang-cl-based configuration.

I also think that, assuming it works "equally well," we should default to the clang-cl-based build (instead of clang) when we detect the C compiler is MSVC and the target is aarch64.

Sure let's always use clang-cl.

MarijnS95 avatar Jan 10 '25 19:01 MarijnS95

Curious what that entails. For me it failed only because of warnings forced as errors in the git tree:

In the issue, you mentioned:

error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement]

Of course, we require C99 at least, so we need to change build.rs to tell clang-cl to use the right version of C. You can see here we have:

        static NON_MSVC_FLAGS: &[&str] = &[
            "-fvisibility=hidden",
            "-std=c1x", // GCC 4.6 requires "c1x" instead of "c11"
            "-Wall",
            "-Wbad-function-cast",
            "-Wcast-align",
            "-Wcast-qual",
            "-Wconversion",
            "-Wmissing-field-initializers",
            "-Wmissing-include-dirs",
            "-Wnested-externs",
            "-Wredundant-decls",
            "-Wshadow",
            "-Wsign-compare",
            "-Wsign-conversion",
            "-Wstrict-prototypes",
            "-Wundef",
            "-Wuninitialized",
        ];

So, when the compiler is clang-cl, we need to pass the clang-cl equivalent of -std=c11 .

In the issue, you also mentioned:

error: unsafe buffer access [-Werror,-Wunsafe-buffer-usage]

Obviously, this is pretty concerning and I'd like to prioritize addressing it.

The reason we have all of these warnings enabled in the .git build is precisely to ensure compatibility with the compilers we support.

If we're adding support for a new compiler, as we are here with clang-cl, then we need to have CI test this configuration. Which means doing the ci.yml changes and getting the build to pass in CI in the .git configuration where all the warnings are enabled.

briansmith avatar Jan 10 '25 19:01 briansmith

NAK. I didn't introduce the idea of "overwrite the compiler with clang" yet before this part of the paragraph, so this would refer to nothing.

It is introduced by the subject line "Don't override clang-cl with clang for ARM64 Windows builds."

briansmith avatar Jan 10 '25 19:01 briansmith

So, when the compiler is clang-cl, we need to pass the clang-cl equivalent of -std=c11 .

The format for this seems to be /std:c11, but it does not resolve those "mixed declarations and code" warnings. I can see the flag being passed:

  error occurred in cc-rs: Command LC_ALL="C" "clang-cl" "-nologo" "-MD" "-Z7" "-Brepro" "--target=aarch64-pc-windows-msvc" "-I" "ring/include" "-I" "ring/target/aarch64-pc-windows-msvc/debug/build/ring-215be1f34e722e16/out" "-Wno-unused-command-line-argument" "-fuse-ld=lld-link" "-vctoolsdir/home/marijn/.xwin-cache/splat/crt" "-winsdkdir/home/marijn/.xwin-cache/splat/sdk" "/std:c11" "/Gy" "/Zc:wchar_t" "/Zc:forScope" "/Zc:inline" "/Wall" "/wd4127" "/wd4464" "/wd4514" "/wd4710" "/wd4711" "/wd4820" "/wd5045" "-WX" "-Fo./ring/target/aarch64-pc-windows-msvc/debug/build/ring-215be1f34e722e16/out/25ac62e5b3c53843-curve25519.o" "-c" "--" "
"ring/crypto/curve25519/curve25519.c" with args clang-cl did not execute successfully (status code exit status: 1).

In the issue, you also mentioned:

error: unsafe buffer access [-Werror,-Wunsafe-buffer-usage]

Obviously, this is pretty concerning and I'd like to prioritize addressing it.

I hope it shows up in CI, or that you can reproduce it locally when cross-compiling for Windows ARM64 using clang/LLVM.

MarijnS95 avatar Jan 10 '25 20:01 MarijnS95

I've rewritten the patch and description:

  1. We now use clang-cl in place of clang, because:
    1. The error I got when cross-compiling (passing cl.exe arguments to clang instead of clang-cl);
    2. If the user is already compiling with clang, nothing needs to be changed;
    3. Compiling with GNU is currently not supported (but is a valid target);
  2. This means the clang-cl path is now natively tested in CI.

Note that this means we have the following targets to add to CI:

  • [x] (Cross-)compiling from a Windows machine to ARM64 Windows using the default cl.exe (triggers the // FIXME to replace it with clang-cl);
  • [ ] (Cross-)compiling from a Windows machine to ARM64 Windows using LLVM / clang;
  • [ ] Cross-compiling from a Linux machine to ARM64 Windows with LLVM (my local target for testing);
  • [ ] :grey_question: Compiling natively on an ARM64 Windows machine using the default cl.exe;
  • [ ] :grey_question: Compiling natively on an ARM64 Windows machine using LLVM / clang.

MarijnS95 avatar Jan 10 '25 20:01 MarijnS95

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.16%. Comparing base (d2e401f) to head (ac6451a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2216      +/-   ##
==========================================
+ Coverage   97.02%   97.16%   +0.13%     
==========================================
  Files         160      158       -2     
  Lines       20391    20330      -61     
  Branches      458      455       -3     
==========================================
- Hits        19785    19754      -31     
+ Misses        498      471      -27     
+ Partials      108      105       -3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 10 '25 20:01 codecov[bot]

In ci.yml, we have (twice):

      - if: ${{ matrix.target == 'aarch64-pc-windows-msvc' }}
        run: |
          echo "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\Llvm\x64\bin" >> $GITHUB_PATH

When using clang-cl, is this still necessary, or is clang-cl already in the path in GitHub actions?

Going all the way back to this, I'm confused by the CI setup:

For the test: job, there's a massive target: list. However, no host_os is set (mandatory for runs-on), so I assume only the explicit pairs in include: are actually running and this list is redundant?

For the "twice" case you mentioned, in test-features: there's no aarch64-pc-windows-msvc present in the matrix at all so if: ${{ matrix.target == 'aarch64-pc-windows-msvc' }} will likely never hit?

MarijnS95 avatar Jan 10 '25 20:01 MarijnS95

In this PR I added /std:c11 and it's clearly visible in the CLI args for clang-cl, yet it keeps failing with a bunch of warnings indicating that c11 support wasn't enabled? (This is a local snippet, CI running on this branch in my fork is flooded with other warnings like -Wdeclaration-after-statement and -Wunsafe-buffer-usage before even printing these other errors.

  cargo:warning=ToolExecError: command did not execute successfully (status code exit status: 1): LC_ALL="C" "clang-cl" "-nologo" "-MD" "-O2" "-Z7" "-Brepro" "--target=aarch64-pc-windows-msvc" "-I" "ring/include" "-I" "target/aarch64-pc-windows-msvc/debug/build/ring-5d0b5febe883c645/out" "-WX" "/Gy" "/Zc:wchar_t" "/Zc:forScope" "/Zc:inline" "/Wall" "/wd4127" "/wd4464" "/wd4514" "/wd4710" "/wd4711" "/wd4820" "/wd5045" "/std:c11" "-fuse-ld=lld-link" "-vctoolsdir/home/marijn/.xwin-cache/splat/crt" "-winsdkdir/home/marijn/.xwin-cache/splat/sdk" "-o" "target/aarch64-pc-windows-msvc/debug/build/ring-5d0b5febe883c645/out/356905f97a05e3f6-sha256-armv8-win64.o" "-c" "--" "target/aarch64-pc-windows-msvc/debug/build/ring-5d0b5febe883c645/out/sha256-armv8-win64.S"
  cargo:warning=ring/crypto/fipsmodule/ec/p256-nistz.c(231,3): error: '_Alignas' is incompatible with C standards before C11 [-Werror,-Wpre-c11-compat]
  cargo:warning=  231 |   alignas(64) P256_POINT table[16];
  cargo:warning=      |   ^
...

The better solution is / might still be to prefer cc to compile with clang (which should ignore user CFLAGS when CC is likewise "ignored") and not have a clang-cl path to supersede a user-set CFLAGS+CC for cl.exe.

MarijnS95 avatar Mar 12 '25 09:03 MarijnS95

Regarding availability of clang-cl again, I was experimenting with our Debian bookworm runner image (Linux), and it doesn't provide clang-cl from neither the llvm-15 nor the clang-15 package, you need to install clang-tools-15 to get access to that.

Before however, our clang-cl was a simple symlink to /usr/lib/llvm-15/bin/clang (and llvm-bin was a symlink to llvm-ar, since these binaries likely look at argv[0] to understand in what mode they're supposed to be running). Not sure why it's a binary blob in that package, maybe it's a copy of clang for no reason.

MarijnS95 avatar Mar 14 '25 15:03 MarijnS95

@briansmith At this stage this PR is a hand full of buildflag changes only and we've been running with it (and cross compiling with this PR) for about 6 months ourselves already. Would it be possible to get this merged into mainline or are there still significant objections to the content of the PR?

Jasper-Bekkers avatar Sep 26 '25 11:09 Jasper-Bekkers

I gleaned the idea to apply the -Wno-* flags that I already had in this PR exclusively to clang-cl from https://github.com/briansmith/ring/pull/2727 (so that we don't break existing cl.exe invocations), but that PR is otherwise identical to what I already had here.

MarijnS95 avatar Oct 15 '25 13:10 MarijnS95

We would love to see this merged into mainline as well in order to unblock some of our builds on the aarch64-pc-windows-msvc target.

oskirby avatar Dec 08 '25 20:12 oskirby

@briansmith is there anything we can do to further this PR? I could rebase it if it helps, but there are no conflicts at the moment.

MarijnS95 avatar Dec 09 '25 11:12 MarijnS95