buck2 icon indicating copy to clipboard operation
buck2 copied to clipboard

Allow passing environment variables to `cmd_args`

Open cbarrete opened this issue 8 months ago • 6 comments

It would be nice if cmd_args had an env attribute like many of the prelude rules support. Is there a reason for not implementing it, or has it just not been an issue before?

In case this is an XY situation, the actual problem is that we'd like to LD_PRELOAD our C++ compiler to use jemalloc. We can use a wrapper script for that, but it makes buck2 log what-ran quite a bit less friendly.

cbarrete avatar Mar 21 '25 01:03 cbarrete

I would like this to, but I'm not sure what the semantics would / should be. Since a cmd_args is not necessarily a complete command line, it's a little strange. Also cmd_args can contain other cmd_args, and there can be an entire tree of them. How do environment variables get merged / flattened? Maybe it makes more sense on a RunInfo?

In my case, we are using a hermetically sealed windows toolchain + winsdk, and we ran into a situation where an open source projects has a header file named share.h. That project expects to find its own share.h, but windows sdk has a file named share.h too and that was the one being found.

The problem never shows up unless using a hermetic toolchain, because in a standard installed-VS environment, which I'd wager is what 99.9% of people use, you are in a VS Command Prompt which sets the INCLUDE environment variable, and INCLUDE environment variable is searched last. But if you put the command line options in your buck toolchain, the toolchain args go on the command line first. And since it's not using INCLUDE environment variable, it gets searched front to back.

While one could argue (and I did argue this) that the project should have name collisions with winsdk, it's still the case that the semantics of the build change unless you can put system include paths into the INCLUDE environment variable.

So we did something like this in our toolchain file, which seems to fix it:


def _compiler_wrapper(
    ctx: AnalysisContext,
    compiler: cmd_args,
    system_include_paths: list
) -> cmd_args:
    wrapper_file, _ = ctx.actions.write(
        ctx.actions.declare_output("cl_wrapper.bat"),
        [
            "@echo off",

            cmd_args(cmd_args(system_include_paths, delimiter=";"), format = "set INCLUDE={}"),
            cmd_args(compiler, absolute_suffix=" %*")
        ],
        allow_args = True,
    )

    return cmd_args(wrapper_file)


def _toolchain_impl(ctx):
    ...
    wrapper = _compiler_wrapper(ctx, c_cxx_compiler, system_include_paths)
    ...
    return [
        CxxToolchainInfo(
            c_compiler_info = CCompilerInfo(
                compiler = RunInfo(args = wrapper),
            )
        )
    ]

zjturner avatar Apr 04 '25 17:04 zjturner

I'm not sure what the semantics would / should be. Since a cmd_args is not necessarily a complete command line, it's a little strange. Also cmd_args can contain other cmd_args, and there can be an entire tree of them. How do environment variables get merged / flattened?

That's a very good point, I hadn't thought about that.

Maybe it makes more sense on a RunInfo?

I guess this would be an option? The only obvious downside I can think of is that RunInfos are only created as providers that are returned from rules (AFAIK), so creating one inline in a rule implementation only to pass it to actions.run would be unconventional. Not a deal breaker from my perspective though!

Your example is basically what we're doing with LD_PRELOAD, but that's quite cumbersome.

cbarrete avatar Apr 04 '25 20:04 cbarrete

I guess a third option would be to add an env provider field on the CxxCompilerInfo. That might even be best. If you put it on RunInfo, then every consumer of a RunInfo has to be updated to support it. If you put it on the CxxCompilerInfo then only probably a handful of callsites (maybe even just 1) need to be updated.

zjturner avatar Apr 07 '25 15:04 zjturner

I recently wanted exactly this (env in cmd_args) for similar reasons in a completely custom toolchain (for which the prelude has no provider). I also arrived at the workaround of "stash the state in the toolchain provider", but this doesn't seem ideal to me because it requires duplicated logic in every rule downstream of the toolchain.

On the other hand, in both my case and the cases described above, it might be useful to populate the environment variable based on a combination of both toolchain- and rule- or even target-defined information, so maybe the right approach is for the toolchain to define a wrapper around actions.run which is responsible for assembling the correct environment.

Ralith avatar Apr 07 '25 17:04 Ralith

There's not much difference between that and just putting an env on the toolchain provider. Either way it requires every existing call to ctx.actions.run to be altered.

zjturner avatar Apr 07 '25 18:04 zjturner

I didn't mean to describe that as an alternative, but rather something you'd combine with info in the provider to reduce code duplication in the final state.

Ralith avatar Apr 07 '25 18:04 Ralith