rules_cuda icon indicating copy to clipboard operation
rules_cuda copied to clipboard

Any possible way to set the host_compiler explicitly instead of inferring from current toolchain?

Open wyhao31 opened this issue 1 year ago • 20 comments

This sounds like a weird request, and here's my use case.

I use a customized C++ toolchain in my project. In order to use ccache, I wrap the actually compiling command into a script, and put the script path as tool_path in the toolchain definition.

tool_path(
    name = "gcc",
    path = "/path/to/wrap-script",
),

When using cuda_library, /path/to/wrap-script appears as the parameter after nvcc -ccbin, this will cause nvcc hangs forever. The correct way of using ccache for cuda libraries is to wrap the nvcc call with ccache, not gcc. This is why I want to set the host_compiler explicitly.

I'm not sure whether I'm using it correctly, or there's an existing way to bypass it. Thanks in advance for any suggestion.

wyhao31 avatar Nov 08 '23 09:11 wyhao31

You need to configure the cuda toolchain manually to achieve it.

cloudhan avatar Nov 08 '23 10:11 cloudhan

Out of curiosity, why do you want to use ccache with bazel?

cloudhan avatar Nov 08 '23 10:11 cloudhan

You need to configure the cuda toolchain manually to achieve it.

Could you please point out how to do it manually? I noticed that host_compiler is from cc_toolchain.compiler_executable [1], and cc_toolchain is from find_cpp_toolchain. IIUC, it will use the current configured C++ toolchain directly.

[1] https://github.com/bazel-contrib/rules_cuda/blob/main/cuda/private/actions/compile.bzl#L36 [2] https://github.com/bazel-contrib/rules_cuda/blob/main/cuda/private/rules/cuda_library.bzl#L18

wyhao31 avatar Nov 09 '23 02:11 wyhao31

Out of curiosity, why do you want to use ccache with bazel?

We're migrating from cmake to bazel, and we use ccache in cmake setup. Besides local cache, we also use secondary cache (remote cache) for ccache, I know Bazel also has remote cache support, but we'll still use ccache in the beginning during the migration.

wyhao31 avatar Nov 09 '23 02:11 wyhao31

The toolchain are instantiated from https://github.com/bazel-contrib/rules_cuda/tree/main/cuda/templates

cloudhan avatar Nov 09 '23 09:11 cloudhan

BTW, bazel also has local cache

cloudhan avatar Nov 09 '23 09:11 cloudhan

The toolchain are instantiated from https://github.com/bazel-contrib/rules_cuda/tree/main/cuda/templates

I think you're talking about cuda toolchains. The host_compiler I mentioned above is from cc toolchain.

wyhao31 avatar Nov 09 '23 09:11 wyhao31

BTW, bazel also has local cache

Maybe I didn't say it clearly. We want to use ccache secondary cache, and we don't want to spend effort setting up bazel remote cache, so we'll still use ccache for some time.

wyhao31 avatar Nov 09 '23 09:11 wyhao31

Oops, it seems I have remenbered it incorrectly. The _cc_toolchain is an implicit attribute to all cuda rules, but is not generally customizable here. IIRC, there is no way to have more than 1 cc toolchain to be selected as runtime for cc rules, so there is no need to provide a way to configure it, because some actions of cuda rules use cc actions to produce artifact.

cloudhan avatar Nov 09 '23 10:11 cloudhan

I'm in a similar situation.

In our use case, we use clang as C/C++ compiler (i.e. cc toolchain resolution chooses clang), but for CUDA code we'd like to use nvcc as the CUDA compiler, which results in errors because it seems that nvcc expects gcc passed to -ccbin. It can be verified by using gcc as C/C++ compiler --- nvcc is happy if so.

But we still want most of our code (non-CUDA code) compiled by clang instead of gcc, so I'd want to find a way to force rules_cuda to use nvcc along with gcc to deal with CUDA code, while using clang for everything else (non-CUDA code).

Currently I have a very ugly solution by defining a series of special cc_toolchain_suite which resolve every possible pair of cpu and compiler to gcc (even if --compiler=clang, etc.), and replace current_cc_toolchain in rules_cuda implementation with it. But it sounds too hacky, and more importantly, when I try to switch to the new platform-based cc toolchain resolution mechanism (where cc_toolchain_suite is deprecated and no-op), I cannot figure out a way to do the same hacking.

Any suggestion?

wudisheng avatar Apr 12 '24 05:04 wudisheng

@wudisheng I think nvcc can accept clang passed to -ccbin.

wyhao31 avatar Apr 12 '24 05:04 wyhao31

@wudisheng You are describing a different problem. To make nvcc to use clang as host compiler (NOTE: you must ensure the host compiler for cuda and cc compiler for cc rules are the same)

https://github.com/cloudhan/rules_cuda_examples/blob/ebaf56457cb0cfc69ae0a10dfebd2e95fd109594/nccl/.bazelrc#L6-L8

build --flag_alias=cuda_compiler=@rules_cuda//cuda:compiler

# device: clang, host: clang, cc: clang
build:clang --repo_env=CC=clang
build:clang --cuda_compiler=clang

# device: nvcc, host: clang, cc: clang
build:nvcc_clang --repo_env=CC=clang
build:nvcc_clang --cuda_compiler=nvcc

# host: compiler_a, cc: compiler_b
# not supported

Otherwise it should be a bug or misconfiguration. https://github.com/bazel-contrib/rules_cuda/blob/8f2f2e6d64d38e46d09538c921304c7c902a2564/cuda/private/toolchain_configs/nvcc.bzl#L66-L78

cloudhan avatar Apr 12 '24 05:04 cloudhan

For any reason you want to use a different host_compiler, here's what I did in my project.

  1. Create a patch of rules_cuda, setting host_compiler[1] to a path you want, changing [2] and [3] to make sure correct PATH environment variable is set.
  2. Apply the patch to rules_cuda in WORKSPACE by using patches attribute[4].

I know it's kind of hacky, and I don't recommend people doing it. It just fits my need, and works in my project.

[1] https://github.com/bazel-contrib/rules_cuda/blob/v0.2.1//cuda/private/actions/compile.bzl#L36 [2] https://github.com/bazel-contrib/rules_cuda/blob/v0.2.1/cuda/private/toolchain_configs/nvcc.bzl#L51 [3] https://github.com/bazel-contrib/rules_cuda/blob/v0.2.1/cuda/private/toolchain_configs/nvcc.bzl#L61 [4] https://bazel.build/rules/lib/repo/http#http_archive-patches

wyhao31 avatar Apr 12 '24 06:04 wyhao31

@wyhao31 I think you can avoid a patch by bazel build --features=-host_compiler_path ..., this will (should, I am not quite sure =) ) disable the --ccbin <cc_toolchain.compiler_path> to be added to the commandline. Then you can rely on @rules_cuda//cuda:copts flag to pass it in as is. :)

[!WARNING] this is not recommended, especially on Windows!

[!WARNING] this will break hermeticity

cloudhan avatar Apr 12 '24 06:04 cloudhan

@wudisheng I think nvcc can accept clang passed to -ccbin.

In our specific environment (e.g. versions), no. And even if it can, we'd like nvcc+gcc, otherwise we could use clang solely without nvcc.

wudisheng avatar Apr 12 '24 06:04 wudisheng

@wyhao31 I think you can avoid a patch by bazel build --features=-host_compiler_path ..., this will (should, I am not quite sure =) ) disable the --ccbin <cc_toolchain.compiler_path> to be added to the commandline. Then you can rely on @rules_cuda//cuda:copts flag to pass it in as is. :)

Thanks! Actually, I tried this approach before, but it turns out not working. The reason is that cuda_compile_action implies host_compiler_path[1], so it seems host_compiler_path cannot be disabled.

[1] https://github.com/bazel-contrib/rules_cuda/blob/v0.2.1/cuda/private/toolchain_configs/nvcc.bzl#L111

wyhao31 avatar Apr 12 '24 06:04 wyhao31

@wudisheng You are describing a different problem. To make nvcc to use clang as host compiler (NOTE: you must ensure the host compiler for cuda and cc compiler for cc rules are the same)

https://github.com/cloudhan/rules_cuda_examples/blob/ebaf56457cb0cfc69ae0a10dfebd2e95fd109594/nccl/.bazelrc#L6-L8

build --flag_alias=cuda_compiler=@rules_cuda//cuda:compiler

# device: clang, host: clang, cc: clang
build:clang --repo_env=CC=clang
build:clang --cuda_compiler=clang

# device: nvcc, host: clang, cc: clang
build:nvcc_clang --repo_env=CC=clang
build:nvcc_clang --cuda_compiler=nvcc

# host: compiler_a, cc: compiler_b
# not supported

Otherwise it should be a bug or misconfiguration.

https://github.com/bazel-contrib/rules_cuda/blob/8f2f2e6d64d38e46d09538c921304c7c902a2564/cuda/private/toolchain_configs/nvcc.bzl#L66-L78

The "%{host_compiler}" here is /.../clang, I'd like it because I want everything except cuda code compiled by clang, but for cuda code, I want nvcc -ccbin /.../gcc.

It seems not supported out of box, and I can hack it, I'm trying to explore a better way to integrate it with the new platform-based toolchain resolution.

wudisheng avatar Apr 12 '24 06:04 wudisheng

I'm curious if it is possible to let bazel to use two differently configured toolchains for the same rule (different targets) in a single bazel build ...?

If it is possible, we can implement a toolchain_type for host compiler (and default to cc toolchain as status quo) and make it configurable.

If not possible. Then it should be left as a hack because it will easily break a lot of things...

cloudhan avatar Apr 12 '24 07:04 cloudhan

Technically, yes. The official way is to use transitions, which is considerably difficult.

For legacy --*crosstool_top cc toolchain resolution, I can manually generate a (different) cc_toolchain_suite which has fake mappings from all possible combinations of cpu,compiler to gcc, and assign it to _cc_toolchain attribute in rules_cuda instead of @bazel_tools//tools/cpp:current_cc_toolchain.

Using this way I can get the behavior I described above, but it is not working if I enable platform-based toolchain resolution, because there isn't a way to make a particular rule as a constraint, and config_settings are generally global.

wudisheng avatar Apr 12 '24 07:04 wudisheng

I also ran into this issue. Maybe this is a naive question, but can't we covert the private attribute _cc_toolchain into a public one cc_toolchain while keeping the default?

hofbi avatar Sep 05 '24 10:09 hofbi