llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

[clangd] [C++20] [Modules] Introduce initial support for C++20 Modules

Open ChuanqiXu9 opened this issue 1 year ago • 37 comments

Alternatives to https://reviews.llvm.org/D153114.

Try to address https://github.com/clangd/clangd/issues/1293.

See the links for design ideas and the consensus so far. We want to have some initial support in clang18.

This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
> try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
> build the preamble.

This patch reflects the above opinions.

Testing in real-world project

I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both std module and asio module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project.

The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move.

What this patch does in detail

  • Introduced an option --experimental-modules-support for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled.
  • Introduced two classes ModuleFilesInfo and ModuleDependencyScanner. Now ModuleDependencyScanner is only used by ModuleFilesInfo.
  • The class ModuleFilesInfo records the built module files for specific single source file. The module files can only be built by the static member function ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...).
  • The class PreambleData adds a new member variable with type ModuleFilesInfo. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too.
  • In isPreambleCompatible(), we add a call to ModuleFilesInfo::CanReuse() to check if the built module files are still up to date.
  • When we build the AST for a source file, we will load the built module files from ModuleFilesInfo.

What we need to do next

Let's split the TODOs into clang part and clangd part to make things more clear.

The TODOs in the clangd part include:

  1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle scheduling, the possibility of BMI version conflicts and various events that can invalidate the module graph.
  2. Get a more efficient method to get the <module-name> -> <module-unit-source> map. Currently we always scan the whole project during ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...). This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include:
    • Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races.
    • Ask the build systems to provide the map just like we ask them to provide the compilation database.
  3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances.

TODOs in the clang part include:

  1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies.

And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).

CC: @HighCommander4 @ilya-biryukov

ChuanqiXu9 avatar Sep 15 '23 06:09 ChuanqiXu9