bazel icon indicating copy to clipboard operation
bazel copied to clipboard

[2/5] support C++20 Modules, add C++20 modules tools

Open PikachuHyA opened this issue 1 year ago • 15 comments
trafficstars

I split the XXL PR https://github.com/bazelbuild/bazel/pull/19940 into several small patches. This is the second patch of Support C++20 Modules, I add C++20 related tools

Overview

This patch contains two tools: aggregate-ddi and gen-modmap. These tools are designed to facilitate the processing of C++20 modules information and direct dependent information (DDI). They can aggregate module information, process dependencies, and generate module maps for use in C++20 modular projects.

The format of DDI

The format of DDI content is p1689. for example,

{
  "revision": 0,
  "rules": [
    {
      "primary-output": "path/to/a.pcm",
      "provides": [
        {
          "is-interface": true,
          "logical-name": "a",
          "source-path": "path/to/a.cppm"
        }
      ],
      "requires": [
        {
          "logical-name": "b"
        }
      ]
    }
  ],
  "version": 1
}

Tools

aggregate-ddi

Description

aggregate-ddi is a tool that aggregates C++20 module information from multiple sources and processes DDI files to generate a consolidated output containing module paths and their dependencies.

Usage

aggregate-ddi -m <cpp20modules-info-file1> -m <cpp20modules-info-file2> ... -d <ddi-file1> <path/to/pcm1> -d <ddi-file2> <path/to/pcm2> ... -o <output-file>

Command Line Arguments

  • -m <cpp20modules-info-file>: Path to a JSON file containing C++20 module information.
  • -d <ddi-file> <pcm-path>: Path to a DDI file and its associated PCM path.
  • -o <output-file>: Path to the output file where the aggregated information will be stored.

Example

aggregate-ddi -m module-info1.json -m module-info2.json -d ddi1.json /path/to/pcm1 -d ddi2.json /path/to/pcm2 -o output.json

generate-modmap

Description

generate-modmap is a tool that generates a module map from a DDI file and C++20 modules information file. It creates two output files: one for the module map and one for the input module paths.

Usage

generate-modmap <ddi-file> <cpp20modules-info-file> <output-file> <compiler>

Command Line Arguments

  • <ddi-file>: Path to the DDI file containing module dependencies.
  • <cpp20modules-info-file>: Path to the JSON file containing C++20 modules information.
  • <output-file>: Path to the output file where the module map will be stored.
  • <compiler>: Compiler type the modmap to use. Only clang, gcc, msvc-cl supported.

Example

generate-modmap ddi.json cpp20modules-info.json modmap clang

This command will generate two files:

  • modmap: containing the module map.
  • modmap.input: containing the module paths.

PikachuHyA avatar May 17 '24 07:05 PikachuHyA

hi @comius , I have updated this patch. please review again.

In this patch, the tools agg-ddi and gen-modmap are added. These two tools replace old native action Cpp20ModulesInfoAction and Cpp20ModuleDepMapAction. The code of tools is put under tools/cpp/cpp20modules_tools

PikachuHyA avatar Jun 13 '24 12:06 PikachuHyA

hi @comius , please review again.

Changes Summary

  • rename agg-ddi to aggregate-ddi
  • rename gen-modmap to generate-modmap
  • remove dependency nlohmann_json
  • implement a naive JSON parser, inspired by src/main/java/net/starlark/java/lib/json/Json.java
  • add JSON test, inspired by src/test/java/net/starlark/java/eval:testdata/json.sky

PikachuHyA avatar Jun 20 '24 06:06 PikachuHyA

hi @mathstuf

Due to the reimplementation with C++, the previous comments regarding the Java implementation can be ignored. If possible, please help review the C++ implementation.

Let's automatically replace "Java implementation" with "C++ implementation."

Does Java lack string formatting? from https://github.com/bazelbuild/bazel/pull/19940#discussion_r1608127731

Since C++ does not have an official format library, we can only use stream operations for now.

note: C++20 has a std::format, but bazel use C++17 now.

Are any escaping mechanisms required to be considered (e.g., spaces in the path)? from https://github.com/bazelbuild/bazel/pull/19940#discussion_r1608128110

Because Bazel cannot properly handle source files with spaces in the path, linking will fail, so I won't address this issue for now.

@comius is this a BUG or by design?

Consider the following cases:

  1. The workspace directory contains spaces, e.g., /tmp/test space Bazel will report an error:

    Starting local Bazel server and connecting to it...
    ERROR: bazel does not currently work properly from paths containing spaces (/tmp/test space).
    
  2. The filename contains spaces, e.g., bar space.cc Bazel will fail to link properly.

For example:

$tree
.
├── bar\ space.cc
├── BUILD.bazel
└── WORKSPACE.bazel

The content of BUILD.bazel:

cc_library(
  name = "bar",
  srcs = ["bar space.cc"]
)

Build the target bar, and it reports an error:

$bazel build :bar
INFO: Analyzed target //:bar (1 packages loaded, 2 targets configured).
INFO: Found 1 target...
ERROR: /tmp/test_space/BUILD.bazel:2:11: Linking libbar.a failed: (Exit 1): ar failed: error executing command (from target //:bar) /usr/bin/ar @bazel-out/k8-fastbuild/bin/libbar.a-2.params
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ar: bazel-out/k8-fastbuild/bin/_objs/bar/bar: No such file or directory
Target //:bar failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.369s, Critical Path: 0.25s
INFO: 7 processes: 5 internal, 2 linux-sandbox.
FAILED: Build did NOT complete successfully

The content of bazel-out/k8-fastbuild/bin/libbar.a-2.params:

rcsD
bazel-out/k8-fastbuild/bin/libbar.a
bazel-out/k8-fastbuild/bin/_objs/bar/bar space.pic.o

For the future, header unit support would require reading use-source-path (bool) and lookup-method (enum). It might be prudent to read these and fail gracefully with a message about header unit non-support. from https://github.com/bazelbuild/bazel/pull/19940#discussion_r1608133842

I will consider supporting header units after merging the large patch. Currently, it only supports named modules, one-phase compilation for Clang, GCC, and MSVC, and two-phase compilation for Clang.

as you mentioned before ( https://github.com/bazelbuild/bazel/pull/19940#issuecomment-1817605194 getting things working across the ecosystem as a baseline before we start up our ricer cars.getting things working across the ecosystem as a baseline before we start up our ricer cars.),

as for https://github.com/bazelbuild/bazel/pull/19940#discussion_r1608128708 it seems that in C++, I don't need to worry about the encoding format. I use default config.

PikachuHyA avatar Jun 20 '24 07:06 PikachuHyA

@comius ping

PikachuHyA avatar Jul 01 '24 07:07 PikachuHyA

@trybka ping

PikachuHyA avatar Jul 17 '24 08:07 PikachuHyA

as https://github.com/bazelbuild/bazel/pull/22429#issuecomment-2241727714 mentioned

I'm testing https://github.com/bazelbuild/bazel/pull/22553 and find a bug in this patch

Changes

  • rebase to the latest master branch
  • fix empty provides or requires

The macOS CI failed, but it seems unrelated to this patch.

PikachuHyA avatar Jul 22 '24 11:07 PikachuHyA

hi @trybka

I expect we're going to want to put the tools behind semantics (in a similar fashion to grep-includes here).

Regarding placing these tools under semantics, do you have any suggestions? Currently, I've introduced two new tools: aggregate-ddi and generate-modmap, and exposed them to CcCompilationHelper via toolchains (_aggregate_ddi, _generate_modmap). The usage is based on the discussion at https://github.com/bazelbuild/bazel/pull/22427#discussion_r1622304165 ( You need to bring in an implicit dependency that points to such a tool. You can do that on cc_toolchain. Follow the example of _grep_include. )

PikachuHyA avatar Jul 23 '24 02:07 PikachuHyA

after rebase to the latest master branch, the CI passed.

PikachuHyA avatar Jul 25 '24 03:07 PikachuHyA

Because Bazel cannot properly handle source files with spaces in the path, linking will fail, so I won't address this issue for now.

@comius is this a BUG or by design?

This is a bug, Bazel can handle source files with spaces (but not with a ":"), C++ rules are to blame.

Regarding placing these tools under semantics, do you have any suggestions?

I'd suggest exposing a dict with 2 attributes from semantics, instead of 2 attr.labels. What we will do internally is put an empty dict there. There'a dict merge operator in starlark, so you can something like attrs = {....} | semantics.cpp_modules_tools.

comius avatar Jul 29 '24 10:07 comius

hi @comius

I'd suggest exposing a dict with 2 attributes from semantics, instead of 2 attr.labels. What we will do internally is put an empty dict there. There'a dict merge operator in starlark, so you can something like attrs = {....} | semantics.cpp_modules_tools.

I have updated the code.

PikachuHyA avatar Jul 30 '24 09:07 PikachuHyA

@comius @trybka can this go in now?

peakschris avatar Aug 04 '24 12:08 peakschris

@comius @trybka ping

PikachuHyA avatar Aug 12 '24 07:08 PikachuHyA

I apologize to @mathstuf for the mistake requested; I accidentally clicked the Re-request review button due to a network issue.

PikachuHyA avatar Aug 12 '24 07:08 PikachuHyA

hi @trybka , as https://github.com/bazelbuild/bazel/pull/22429#pullrequestreview-2258152911 mentioned, I checked the PR again and find one unresolved comment. I marked as resolved (see https://github.com/bazelbuild/bazel/pull/22427#discussion_r1730596288). please try import this patch again.

PikachuHyA avatar Aug 26 '24 03:08 PikachuHyA

I believe @comius is working on the import.

trybka avatar Aug 27 '24 01:08 trybka