buck2 Generated `compile_commands.json` uses relative paths for `directory` field

Hi,

I'm having issues using the generated the output from the compilation-database subtarget, which seems to boil down to clangd not liking relative directory fields.

Here's how to reproduce the issue using examples/with_prelude:

> buck2 init --git
...
> buck2 build cpp/hello_world:main[compilation-database]
...
> find -name compile_commands.json
./buck-out/v2/gen/root/524f8da68ea2a374/cpp/hello_world/__main__/compile_commands.json
> ln -s ./buck-out/v2/gen/root/524f8da68ea2a374/cpp/hello_world/__main__/compile_commands.json .
> nvim cpp/hello_world/main.cpp

At this point clangd starts, but it generates a bunch of erroneous diagnostics and doesn't let me go to the definition of print_hello. If I edit compile_commands.json to make directory an absolute path and open main.cpp again, everything works.

Arguably, clangd could support relative paths, but looking around on the internet indicates that this is unlikely (also clang-tidy depends on absolute directory paths).

Could the directory path that Buck2 generates be made absolute?

Thanks!

Jun 26 '23 22:06 cbarrete

The problem is that if you write out absolute paths, its no longer possible to cache the compilation database between users. It's fairly annoying.

Jul 06 '23 17:07 ndmitchell

Is sharing compilation database between users a common thing to do? It's an artefact of build tool, why would you share it instead of generate per user?

Another question is whether the utility of being able to share the compilation database outweigh not being able to use clangd, without postprocessing the compilation database, which seems rather annoying.

Jul 06 '23 17:07 knopp

Sharing between users is super common, since it speeds everything up.

That said, sharing a useless pile of bytes that doesn't work has no value. Maybe the toolchain should have a boolean as to whether it generates absolute paths (with no caching) or relative paths (which are cached). I'm trying to find out how we deal with this internally, since we use clangd somehow.

Jul 06 '23 17:07 ndmitchell

It seems like internally we use Bxl as a wrapper that takes the compilation-database target, turns the paths to absolute, and then uses that for clangd. A bit fiddly, but does ensure that everyone can share most of the work.

Jul 06 '23 18:07 ndmitchell

Sharing between users is super common, since it speeds everything up.

I would expect generating the compilation database to be very fast, is it actually worth speeding up via caching? And if it is the case for huge monorepos, should this be the exception rather than the rule? Meaning, buck2 would by default "just work", but larger organizations could set up caching/sharing if it becomes relevant for them.

Jul 06 '23 23:07 cbarrete

There's downstream consumers that care if the file was cached or updated, so it gets a bit tricky when we write out these paths if you want to turn something absolute.

Aug 14 '23 23:08 bobyangyf

There's downstream consumers that care if the file was cached or updated, so it gets a bit tricky when we write out these paths if you want to turn something absolute.

Could you elaborate @bobyangyf? (I somehow missed the replies on this issue, but recently hit this again)

More generally, my understanding is that the preferred solution for this would be to use other scripts to generate compilation databases. I don't necessarily mind this, but it is somewhat confusing that there exists a first class full-compilation-database on all C++ targets, but that it is essentially broken out of the box.

May 26 '24 20:05 cbarrete

For the record, I have been using the following script lately:

load("@prelude//utils:utils.bzl", "flatten")
load("@prelude//cxx/comp_db.bzl", "CxxCompilationDbInfo")
load("@prelude//cxx/compile.bzl", "CxxSrcCompileCommand")

def _make_entry(ctx: bxl.Context, compile_command: CxxSrcCompileCommand) -> dict:
    args = compile_command.cxx_compile_cmd.base_compile_cmd.copy()

    # This prevents clangd from jumping into `buck-out` using Go To Definition,
    # which significantly improves user experience.
    args.add(["-I", "."])
    args.add(compile_command.cxx_compile_cmd.argsfile.cmd_form)
    args.add(compile_command.args)
    ctx.output.ensure_multiple(args)

    return {
        "file": compile_command.src,
        "directory": ctx.fs.abs_path_unsafe(ctx.root()),
        "arguments": args,
    }

def make_compilation_database(ctx: bxl.Context, actions):
    db = []
    for name, analysis_result in ctx.analysis(flatten(ctx.cli_args.targets)).items():
        comp_db_info = analysis_result.providers().get(CxxCompilationDbInfo)
        if comp_db_info:
            db += [_make_entry(ctx, cc) for cc in comp_db_info.info.values()]

    db_file = actions.declare_output("compile_commands.json")
    actions.write_json(
        db_file.as_output(),
        db,
        with_inputs = True,
        pretty = True,
    )
    return db_file

def _gen_impl(ctx: bxl.Context):
    actions = ctx.bxl_actions().actions
    ctx.output.print(ctx.output.ensure(make_compilation_database(ctx, actions)))

gen = bxl_main(
    impl = _gen_impl,
    cli_args = {
        "targets": cli_args.list(cli_args.target_expr()),
    },
)

What I like about is is that:

It materializes all relevant inputs, but does not actually build the targets, so only e.g. argsfiles and codegen are run, automagically
It heavily leverages the prelude so that there's a good chance that it is correct
It doesn't use the relatively heavy compilation database subtargets, which I found difficult to work with
It is split out so that other scripts (e.g. for clang-tidy integration) can reuse the core logic to generate a compilation database
It's quite small and easy to grok

Aug 30 '24 21:08 cbarrete